Spaces:

nanotron
/

ultrascale-playbook

Running

ZeRO-3 vs FSDP 1 vs FSDP 2

#75

by BansalDhruva - opened 4 days ago

4 days ago

I realize that the blog post equates ZeRO 3 and FSDP. But what are the big differences in the DeepSpeed ZeRO 3 (https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/runtime/zero/stage3.py) implementation and the Meta FSDP implementation (https://github.com/pytorch/pytorch/blob/v2.6.0/torch/distributed/fsdp/fully_sharded_data_parallel.py#L127)?

Additionally, its well documented that the Meta FSDP implementation is less stable for longer training runs than the DeepSpeed ZeRO 3 implementation. Why is this the case? Anything inherent about the differences in the two implementations?

Finally, do you think the stability issues could be solved in FSDP 2 (https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md)?

nouamanetazi

Nanotron Research org 2 days ago

I realize that the blog post equates ZeRO 3 and FSDP. But what are the big differences in the DeepSpeed ZeRO 3

You can think of FSDP as the pytorch native manner to do ZeRO-3 that was initially implemented in DeepSpeed. The idea of ZeRO-3 stays the same, but each library implements in its own manner to make it compatible with the checkpoints logic, compatibility with other parallelisms etc... You can find some of these differences in accelerate's docs but keep in mind that all libraries with this regard are still evolving to maximize efficiency, so after 1 or 2 months maybe the two libraries will converge to the same implementation

nouamanetazi

Nanotron Research org 2 days ago

Additionally, its well documented that the Meta FSDP implementation is less stable for longer training runs than the DeepSpeed ZeRO 3 implementation. Why is this the case? Anything inherent about the differences in the two implementations?

source for this claim?

nouamanetazi

Nanotron Research org 2 days ago

Finally, do you think the stability issues could be solved in FSDP 2 (https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md)?

We included in the blog a link to this nice blog that explains some advantages of FSDP2 over FSDP. (search FSDP2)

nouamanetazi

Nanotron Research org 2 days ago

Hope that answers your questions! :)

nouamanetazi changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment