Name changes for smaller BLOOM models

#73
by eliolio - opened

Hi, I'm posting this here since it concerns 4 Bloom models. I hope it's okay 😅
I noticed that the BLOOM models 350M, 760M, 1b3, 6b3 have changed names and are now bloom-560M, 1b1, 1b7 and 7b1, respectively.

Old model bloom-350m bloom-760m bloom-1b3 bloom-6b3
New model bloom-560m bloom-1b1 bloom-1b7 bloom-7b1

From the commit history, I believe that the actual models are still the same (so just a name change). I simply wanted to point it out here since it can be a bit confusing.

BigScience Workshop org
edited Aug 10, 2022

Hey, thanks for raising this!
We changed the names, as prior parameter counts were incorrect. The new names reflect the correct parameter counts including embedding parameters. You can find details in the Technical Specifications section of their READMEs / looking at their model files.

Note that the models have changed. Their weight files have been updated to their final pre-trained checkpoints. Previous checkpoints were intermediate.
We recommend everyone to update their cached model files as the previous intermediate checkpoints are inferior. You can look at the commit history to see when exactly the final checkpoint was uploaded (It's the latest commit replacing the pytorch_model.bin file).
🌸🌸🌸

Muennighoff changed discussion status to closed

Sign up or log in to comment