Text Generation
Transformers
PyTorch
English
qwen2
conversational
Inference Endpoints
text-generation-inference

DeepSpeed ZeRO-3 and full finetune

#5
by Andriy - opened

Hi! A question: did you have challenges with using DeepSpeed ZeRO-3 and full finetune? What was the reason for using DeepSpeed ZeRO-2 and QLoRa? I'm asking because we have an issue with LLMs and DeepSpeed ZeRO-3. The issue is that if you load on LLM with ZeRO-3, then save, and then load again, the model becomes broken. Did you experience something like that?

I always use qLoRA to save VRAM
I can't use deep speed zero3 - I always get error messages.

Abacus.AI, Inc. org

While it was not used for this model we have used ZeRO-3 Full and LoRA for other models successfully. Depending on the setup we have to run a manual weight gather step. Other than that it seems to work. When we use ZeRO-3 + LoRA we disable optimizer offload since the LoRA weights tend to be a small fraction. We have not tested QLoRA.

siddartha-abacus changed discussion status to closed

Sign up or log in to comment