Training evals\experiments
#2
by
borgr
- opened
Hi,
I saw the loss during training in a figure (https://medium.com/snowflake/snowflake-arctic-cookbook-series-exploring-mixture-of-experts-moe-c7d6b8f14d16) as well as losses of ablated models etc. Could you share them for future research?