Model outputs nans
#17
by
oolongie
- opened
I've been trying to finetune gemma-2-2b, and it seems to work fine when I train it locally, on CPU. However, on two of the clusters that I have available, the model outputs nans. I'm using a singularity container on both and the same dataset for finetuning. So the only difference is likely using GPUs. What could be a cause of this behaviour?
oolongie
changed discussion status to
closed