DeepSpeed ZeRO-3 and full finetune

by Andriy - opened Mar 11, 2024

Mar 11, 2024

Hi! A question: did you have challenges with using DeepSpeed ZeRO-3 and full finetune? What was the reason for using DeepSpeed ZeRO-2 and QLoRa? I'm asking because we have an issue with LLMs and DeepSpeed ZeRO-3. The issue is that if you load on LLM with ZeRO-3, then save, and then load again, the model becomes broken. Did you experience something like that?

ehartford

Mar 11, 2024

I always use qLoRA to save VRAM
I can't use deep speed zero3 - I always get error messages.

siddartha-abacus

Abacus.AI, Inc. org Mar 11, 2024

While it was not used for this model we have used ZeRO-3 Full and LoRA for other models successfully. Depending on the setup we have to run a manual weight gather step. Other than that it seems to work. When we use ZeRO-3 + LoRA we disable optimizer offload since the LoRA weights tend to be a small fraction. We have not tested QLoRA.

siddartha-abacus changed discussion status to closed Mar 11, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment