Finetuning BLOOM 175-B

#54

by mayank-mishra - opened Jul 18, 2022

Jul 18, 2022

I wanted to know where I can download the DeepSpeed checkpoints for finetuning the BLOOM model.
As far as I know the model provided in HuggingFace has been reshaped from (TP = 8 x PP = 6) to (TP = 8).
Even if one could convert it to the original form from the HF checkpoint, having the original optimizer would be really helpful.

If anyone has any idea where to find the DeepSpeed checkpoints, please point me to it. 🤗

ybelkada

BigScience Workshop org Jul 20, 2022

Hi @mayank31398
Thank you for your message !
I have just uploaded the latest DeepSpeed checkpoints here: https://huggingface.co/bigscience/bloom-optimizer-states (2.3 TB), they are from the model we are using on the inference API (global step 95000)
If you need more checkpoints, from early global steps please do let us know by opening an issue on the mentioned repo so that we can take care of that
Thanks a lot 🤗

mayank-mishra

Jul 20, 2022

Thanks @ybelkada , these checkpoints will be really helpful. Also, is there a way to convert HF checkpoints to DeepSpeed?
I see that there is a script to convert DeepSpeed checkpoints to HF on the Megatron-DeepSpeed repo. If there exists a way to invert this, then that will be really helpful.

ybelkada

BigScience Workshop org Jul 20, 2022

No worries!
As far as I know there is a script to convert HF model into DS format but the script is a bit hacky and you'll need to adapt it for BLOOM (we tried it for OPT only). You can probably start from this script and just adapt it with the correct key names ;)
But the repo I have attached you already contains the DS converted models + optimizer states, just out of curiosity may I ask you why you would need to convert back the model into DS format?
Thanks a lot!

mayank-mishra

Jul 20, 2022

•

edited Jul 20, 2022

No particular reason @ybelkada . I just wanted to know if one wanted to convert HF to DS. Then how would one do it.
Also, I had a question: any particular reason why BigScience uses ZeRO-DP instead of FSDP in PyTorch?
Is the reason just that, Megatron-DeepSpeed is readily available from Microsoft?

Also, thanks a lot for providing the full model checkpoints. This would be really helpful to the folks looking at finetuning BLOOM 175B

ybelkada

BigScience Workshop org Jul 20, 2022

Okay got it thanks !
I think that it would be a nice contribution indeed, but the since HF models have very different key names across different models (for eg OPT has very different key names than BLOOM) we would maybe need a single script per model class.
Regarding your second question I will probably let @stas answer that one ;)
Thanks again!

mayank-mishra

Jul 20, 2022

Closing this issue.

mayank-mishra changed discussion status to closed Jul 20, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment