Model outputs nans

#17
by oolongie - opened

I've been trying to finetune gemma-2-2b, and it seems to work fine when I train it locally, on CPU. However, on two of the clusters that I have available, the model outputs nans. I'm using a singularity container on both and the same dataset for finetuning. So the only difference is likely using GPUs. What could be a cause of this behaviour?

Okay, so I've been able to find the cause and solution. The issue happens because of using padding, here there are several possible solutions proposed.

oolongie changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment