Checkpoint "step200000-tokens838B" seems like fully trained model
Thank you for your great work!
I tried to evaluate the checkpoints on the Korean benchmark, Haerae Bench to analyze how the multilingual ability of OLMo evolves with the pre-training steps. The results revealed that the "step200000-tokens838B" checkpoint performance is same as fully trained "main" checkpoint. I think there seems to have an error when "step200000-tokens838B" checkpoint was saved. Please check if there are any errors in the "step200000-tokens838B" checkpoint!
Thank you for pointing this out! Unfortunately you are right. Something went wrong with the upload-to-HF job for some of the checkpoints, which we will investigate.
Steps 200k to 251k incorrectly match the fully trained model. We will try to fix them quickly. Please let us know if you see any other incorrect checkpoints.
Thank you for your kind response! Unfortunately, the experiment was only conducted on the models shown in the table, so I am not sure about other checkpoints.. 🥲 I will let you know if there are strange results through future experiments!
Steps 200k to 251k are now updated are far as I can tell. Please let us know if you encounter any other issues.