Loss and eval wer getting worse after a few epochs

#12
by andrespm - opened

Hi @ylacombe

I am trying to do the fine-tuning of w2v-bert-2.0 for Spanish following your post, but I am finding that after the first few steps where both WER and loss are reduced, both start to grow from the second epoch onwards. In some of the tests I have also found that suddenly eval_loss appears as NaN and eval_wer as 1.0.
Any idea what might be happening?
As dataset I am using common voice splits. The strange thing is that I have done the fine-tuning for another language (Galician) also using common voice and a similar split, and there the training works perfectly, reducing the loss and the wer progressively.

The behaviour I find is similar to the one presented in this discussion:
https://discuss.huggingface.co/t/wav2vec2-loss-growing-in-training-and-validation-after-few-epochs/14165

Thanks!

Owner

Hey @andrespm ,
Thanks for your message, from my experience, it should be fixable. Here are a few tips:

  • For eval_loss=NaN, you can set config.ctc_zero_infinity = False (docs here)
  • Spanish is a much larger dataset in Common Voice than Galician or Mongolian, so you need to adapt hyper-parameters and the training config: Ideally, you would train for a much larger number of epochs on a lower learning rate.

Let me know if that help!

Hey @ylacombe

Thank you, I will give that configuration a try. However, I don't know if the size of the dataset is the problem, as I am using a partition of the Spanish common voice so that the resulting dataset is similar to the Galician one (I train with the same number of sentences). I have tried with different versions of common voice for Spanish, in case there is a problem with the particular version I am using, but I still don't get good results.
I'm going to try also with a different dataset than common voice, although it escapes me how this could be affecting it.

Best!

Owner

In that case, config.ctc_zero_infinity = False and a lower learning rate should help you! You can also set a minimum audio length (>2s e.g). The model should clearly work pretty well with Spanish so it's only a matter of finding the right parameters to "crack" the training.

Sign up or log in to comment