How long was this model trained?

#8
by jploski - opened

How many steps/epochs was this particular model trained on? And which of the datasets was used: was it https://huggingface.co/datasets/roneneldan/TinyStories/tree/main/TinyStories-train.txt?

I can only find Figure 3 in the paper showing 2.5K steps. Am I right that it translates into ~1.5 epochs using the TinyStories-train.txt dataset and parameters from the model card?

About 20 epochs. Context length 512, batch size 80 (20 per device over 4 V-100 GPUs), 16 gradient accumulation steps. Learning rate 5e-4, wd=0.1, betas 0.9,0.95. The file used to train was indeed https://huggingface.co/datasets/roneneldan/TinyStories/blob/main/TinyStories-train.txt.

Thanks for the detailed info!

jploski changed discussion status to closed

Sign up or log in to comment