Difference to bigscience/bloom-350m

by malteos - opened

PLEASE DO NOT USE THIS MODEL! IT WILL BE REMOVED SOON. USE https://huggingface.co/bigscience/bloom-560m instead which is the same model.

Is it just the different naming because of a wrong number of parameters or is there any other difference? Hashes of model weights are at least identical.

BigScience Workshop org

Yes it's just naming; 350m is just kept for backwards compatibility & it will be removed soon

Alright but isn't the Slurm script producing a 350m model and not a 560m model?

BigScience Workshop org

It's the same model; the different names (parameter counts) correspond to whether or not you count embedding parameters.

Sign up or log in to comment