NeMo
English
nvidia
dpo
llama3

Training memory requirements.

#1
by thanhdaonguyen - opened

What is the minimum memory it takes to train DPO on Llama-3-70B with context length 4096? And what is the config to achieve that?

Got this answer from my colleague who worked on it:

The minimum VRAM necessary is 80GB per GPU, and 8 GPUs per node.
With TP=8 and PP=2, you will need a minimum of 2 nodes to host the model, and this will allow you to train with DP=1.
We used 32 nodes for training and this gave us DP=16. These numbers assume you're using Nemo's bf16-mixed mode.

Thank you!

Sign up or log in to comment