Spaces:

openai
/

whisper

Running on L40S

App Files Files Community

132

Language identification gets worse after multilingual fine-tuning

#93

by andrespm - opened Jul 3, 2023

Discussion

andrespm

Jul 3, 2023

I am checking that the language detection gets worse after performing a multilingual fine-tuning. I have tried fine-tuning with and without the language label, and in both cases the language detection gets worse on the trained languages.

Is it necessary to perform the Fine-Tuning Whisper on both Language Identification and Transcription tasks?
How is this done?
Any idea why this is happening?

sanchit-gandhi

Jul 3, 2023

•

edited Jul 3, 2023

Hey @andrespm - we would expect that Whisper performance would either stay the same or reduce after fine-tuning, apart from on the single language that we fine-tune it on. This is because the model is prone to 'forgetting' the knowledge that it acquired during pre-training and instead focussing entirely on the task presented during fine-tuning (i.e. tune the weights entirely for the multilingual ASR task)

What you can try doing is fine-tuning with LoRA / AdaLoRA - in my experience, these two paradigms significantly improve the model's ability to retain its pre-training knowledge during fine-tuning

See https://github.com/Vaibhavs10/fast-whisper-finetuning for details

andrespm

Jul 4, 2023

Hi @sanchit-gandhi

Thank you very much for your reply.
I will try LoRA / AdaLoRA and check again the performance.

However, one of the languages I am working with is Galician, a language that is under-represented in Whisper and with which the language identification in Whisper's pre-trained models does not work very well (I also work with Spanish and Portuguese, and it tends to confuse them with these two languages, which are more represented in the base models).
I am also exploring the possibility of fine-tuning on both Language Identification and Transcription tasks. I have not found any examples of this. If you have more information that would be great.
I'm discussing it also in this github post:
https://github.com/openai/whisper/discussions/1454#discussioncomment-6345649

Again, thanks!!!!

amine

Nov 15, 2024

Hi @andrespm
Have you mitigated the decrease in LID quality after multilingual fine-tuning?
We are also observing this even when prefixing the labels with the correct language tag.
Since language identification is done as token prediction I think that we are already training on both ASR and LID when we set language token inside the label tokens.

andrespm

Nov 18, 2024

Hi @amine

Yes, you are right. By including tokens in this way we are already doing multitasking training, and the results are good in LID (at least in my case, with 6 languages).

andrespm changed discussion status to closed Nov 18, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment