Text2Text Generation
Transformers
Safetensors
101 languages
t5
Inference Endpoints
text-generation-inference

Memory Error While Fine-tuning AYA on 8 H100 GPUs

#23
by ArmanAsq - opened

Hello,

I am currently trying to fine-tune an AYA model on 8 H100 GPUs, but I'm encountering a memory error. My system has 640 GB of GPU RAM, which I assumed would be sufficient for this task. I'm not using PEFT or LoRA, and my batch size is set to 1.
I'm wondering if anyone has encountered a similar issue and could provide some guidance. How many GPUs are typically recommended for this task? Any help would be greatly appreciated.

Thanks in advance!

Sign up or log in to comment