nans while fine tuning

#14
by edmond - opened

How come, no matter my learning rate, I end up having a prediction giving nans ?
My inputs' max and min values are okay, the output's too, but for some reason I end up having nans.
I even instantly have nans if I train with a soft prompt self.trans(inputs_embeds=patch_emb) (min an max are okay).
When i predict, before training, the values are fine too. And if i train on bert and inject the information of the soft prompt by adding the same embedding on all the sequence it works fine.

I'm experiencing the same issue. Did you find a solution?

Good to know, no it failed and I do not think its my fault as I had amazing results with using a masked decoder only https://huggingface.co/microsoft/layoutlmv3-large, but its not supposed to work good as its not trained on natural images.
I have no time to try https://huggingface.co/bigscience/mt0-base (I know bigscience are serious people as bloomz works amazinlgy for me) which might not be buggy, please tell me if you have any result with it.

I'm experiencing the same issue. Did you find a solution? @rburke45

Were you using a reduced precision version of the model? Both the FP16 & INT8 models output nans when training, but the full precision model is training fine for me now.

Yes i am using fp16 but I read in a thread william falcon saying using fp32 will only postpone the phenomenon.
Maybe he was wrong, did you succeed training it till overfitting ? @rburke45

Yep, trained a full 40 epochs with overfitting starting around epoch 20. Not sure what issue William Falcon has, but I'm not seeing it here.

Thanks ok

edmond changed discussion status to closed

Sign up or log in to comment