Text Generation
Transformers
PyTorch
Safetensors
English
olmo
custom_code

Checkpoints

#15
by borgr - opened

There are multiple checkpoints mentioned all inside OLMo-7B repo, how could one part be with LR going to 0 and a later one in the same repo not? What does it mean about the rest of the checkpoints found in the repo?

Allen Institute for AI org

Hi @borgr , for the revisions from step0 to step556, we follow a linear LR schedule, and then in the last 1000 steps, we anneal the LR to 0. We found this to be better for the performance of the final model.

I think I didn't put the question well

I find the differences between those checkpoints unclear, specifically the ones that are part of allenai/OLMo-7B, how can the not annealed one be the one with more tokens,batches and steps?
image.png

Allen Institute for AI org

@borgr This might make it clearer:

OLMo-7B step452k | 2T tokens | following linear schedule (not annealed)
OLMo-7B step 556k | 2.460T tokens | still following linear schedule (not annealed)
OLMo-7B step 557k (main) | 2.464T tokens | LR annealed to 0

Maybe write in the NAME and Note something comparable between the second and third row then?

Sign up or log in to comment