Notebook constantly running out of CUDA memory

#12
by vectornaut - opened

I have tried using all of the different processors (A100, L4, T4), as well as reducing the training batch size to 2. No matter what, though, when trying to train the model, I constantly run out of CUDA memory.

Cohere For AI org

Hey @vectornaut

are you training in FP32 or bfloat16?
Aya 23 8B takes around ~32GB VRAM for inference in FP32 & around 18GB VRAM for inference in bfloat16. Training would take around ~4x VRAM of what's required for inference so would recommend picking GPUs accordingly.

Sign up or log in to comment