Training evals\experiments

#2
by borgr - opened

Hi,
I saw the loss during training in a figure (https://medium.com/snowflake/snowflake-arctic-cookbook-series-exploring-mixture-of-experts-moe-c7d6b8f14d16) as well as losses of ablated models etc. Could you share them for future research?

Sign up or log in to comment