Memory Error While Fine-tuning AYA on 8 H100 GPUs
#23
by
ArmanAsq
- opened
Hello,
I am currently trying to fine-tune an AYA model on 8 H100 GPUs, but I'm encountering a memory error. My system has 640 GB of GPU RAM, which I assumed would be sufficient for this task. I'm not using PEFT or LoRA, and my batch size is set to 1.
I'm wondering if anyone has encountered a similar issue and could provide some guidance. How many GPUs are typically recommended for this task? Any help would be greatly appreciated.
Thanks in advance!
shivalikasingh
changed discussion status to
closed