Intermediate checkpoints for research purposes

#11
by maveriq - opened

Hi!

I was wondering if you will share intermediate checkpoints to enable comparison with models trained on less tokens on the same dataset but with different architectures? Checkpoints at 10B-50B will be especially useful for those with limited compute resources.

Thanks!

@loubnabnl
Second this, would be great to have. What about wandb plots, are you planning to release those?
Thanks!

@eliebak @loubnabnl

do you have any plan to release the intermediate checkpoints with/without cooldown? The checkpoints every 100k steps like the ones used in the following figure from your blog post can be immensely useful for research purposes.
Untitled 11.png

Thanks!

Sign up or log in to comment