Loss and eval wer getting worse after a few epochs

#12

by andrespm - opened Feb 1, 2024

Feb 1, 2024

I am trying to do the fine-tuning of w2v-bert-2.0 for Spanish following your post, but I am finding that after the first few steps where both WER and loss are reduced, both start to grow from the second epoch onwards. In some of the tests I have also found that suddenly eval_loss appears as NaN and eval_wer as 1.0.
Any idea what might be happening?
As dataset I am using common voice splits. The strange thing is that I have done the fine-tuning for another language (Galician) also using common voice and a similar split, and there the training works perfectly, reducing the loss and the wer progressively.

The behaviour I find is similar to the one presented in this discussion:
https://discuss.huggingface.co/t/wav2vec2-loss-growing-in-training-and-validation-after-few-epochs/14165

Thanks!

ylacombe

Owner Feb 1, 2024

Hey @andrespm ,
Thanks for your message, from my experience, it should be fixable. Here are a few tips:

For eval_loss=NaN, you can set config.ctc_zero_infinity = False (docs here)
Spanish is a much larger dataset in Common Voice than Galician or Mongolian, so you need to adapt hyper-parameters and the training config: Ideally, you would train for a much larger number of epochs on a lower learning rate.

Let me know if that help!

andrespm

Feb 1, 2024

Hey @ylacombe

Thank you, I will give that configuration a try. However, I don't know if the size of the dataset is the problem, as I am using a partition of the Spanish common voice so that the resulting dataset is similar to the Galician one (I train with the same number of sentences). I have tried with different versions of common voice for Spanish, in case there is a problem with the particular version I am using, but I still don't get good results.
I'm going to try also with a different dataset than common voice, although it escapes me how this could be affecting it.

Best!

ylacombe

Owner Feb 1, 2024

In that case, config.ctc_zero_infinity = False and a lower learning rate should help you! You can also set a minimum audio length (>2s e.g). The model should clearly work pretty well with Spanish so it's only a matter of finding the right parameters to "crack" the training.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment