allenai/longformer-large-4096 · Gradient is nan when Finetuning Pytorch Model

Apr 5, 2023

I encountered a problem while fine-tuning the Longformer Large PyTorch model, as I received a NaN gradient error during the training process. However, when I tried to perform the same fine-tuning process using the Longformer Base model, everything worked fine without any issues.

I am unsure what could be causing this problem with the Longformer Large model specifically. I have made sure that my data is free of NaN values, and I have checked that I am using the correct version of the Longformer tokenizer.

If anyone has any suggestions or has encountered a similar issue while fine-tuning the Longformer Large model, please let me know. I would be grateful for any assistance or insights.

Thank you in advance for your help.

epurdy

Jun 23, 2023

It can mean your learning rate is too large for this specific model. Try dropping it by a factor of 10 or more.

Serega6678

Apr 27

@willieseun did you resolve the problem?

willieseun

May 3

Nope

Serega6678

May 3

@willieseun Thanks for the response! For me, I kept training for longer (even without reducing lr) and the gradients eventually became smaller because sometimes they were non-NAN. Eventually, the problem resolved itself with time

willieseun

May 4

Alright. Thanks for the update