I wanted to know where I can download the DeepSpeed checkpoints for finetuning the BLOOM model.
As far as I know the model provided in HuggingFace has been reshaped from (TP = 8 x PP = 6) to (TP = 8).
Even if one could convert it to the original form from the HF checkpoint, having the original optimizer would be really helpful.
If anyone has any idea where to find the DeepSpeed checkpoints, please point me to it. 🤗
Thank you for your message !
I have just uploaded the latest DeepSpeed checkpoints here: https://huggingface.co/bigscience/bloom-optimizer-states (2.3 TB), they are from the model we are using on the inference API (global step 95000)
If you need more checkpoints, from early global steps please do let us know by opening an issue on the mentioned repo so that we can take care of that
Thanks a lot 🤗
, these checkpoints will be really helpful. Also, is there a way to convert HF checkpoints to DeepSpeed?
I see that there is a script to convert DeepSpeed checkpoints to HF on the Megatron-DeepSpeed repo. If there exists a way to invert this, then that will be really helpful.
As far as I know there is a script to convert HF model into DS format but the script is a bit hacky and you'll need to adapt it for BLOOM (we tried it for OPT only). You can probably start from this script and just adapt it with the correct key names ;)
But the repo I have attached you already contains the DS converted models + optimizer states, just out of curiosity may I ask you why you would need to convert back the model into DS format?
Thanks a lot!
No particular reason
. I just wanted to know if one wanted to convert HF to DS. Then how would one do it.
Also, I had a question: any particular reason why BigScience uses ZeRO-DP instead of FSDP in PyTorch?
Is the reason just that, Megatron-DeepSpeed is readily available from Microsoft?
Also, thanks a lot for providing the full model checkpoints. This would be really helpful to the folks looking at finetuning BLOOM 175B
Okay got it thanks !
I think that it would be a nice contribution indeed, but the since HF models have very different key names across different models (for eg OPT has very different key names than BLOOM) we would maybe need a single script per model class.
Regarding your second question I will probably let @stas answer that one ;)