Train loss for reproduction

by yjkimstats - opened


Upon your feedback, now I've started to train model for reproduction!
With the sample ratio in the paper, I believe that I can compose the same datasets for training dataset except GPT synthetic data.
However, when it comes to training, I've got weird train loss curves which do not decline after several iterations.
So, could you give me some information regarding training dynamics(e.g. converged training loss or loss curve)

Thank you

Sign up or log in to comment