Add memory calculation for ZeRO stages

#14
by deleted - opened
deleted

Since training large models consumes large memory with adam optimizer. Some may train them with a frame work like deepspeed, which implemeted ZeRO algorithms (ZeRO: Memory Optimizations Toward Training Trillion Parameter Models) to save memory. It would be really appreciated if you provide the memory usage for different ZeRO stages since experimenting for this costs a lot.

accelerate org

We are looking into the possibility of doing this :)

Sign up or log in to comment