openai/whisper · Fine-tuning Whisper in more than one language

Jun 6, 2023

Suppose I have a dataset in two or more languages (one of them under-represented in Whisper's pre-trained models), and I want to fine-tune those 2 or more languages to continue with a multilingual model and avoid catastrophic forgetting. Is fine-tuning possible?

Can I define the tokenizer and the processor without indicating the language?

andrespm changed discussion status to closed Jun 21, 2023

bmichele

Jun 26, 2023

Hi! I am looking at a similar scenario. In case you managed to find a solution, would you be able to share it? :)

andrespm changed discussion status to open Jun 29, 2023

andrespm

Jun 29, 2023

Hi @bmichele !
I've found some sort of solution on this thread:
https://huggingface.co/spaces/openai/whisper/discussions/6#643d8bc551e2958ef6cd69ef

However, I'm still wondering which is the best strategy:

I've tried fine-tuning sequentially, by the results get worst on each fine tune cycle.
I've tried fine-tuning in a multilingual way, avoiding the "lang" label in the tokenizer and in the processor, relying on the Whisper ability to detect the language, and the results are promising.
Finally, I've tried fine-tuning in a multilingual way, indicating the lang label as is indicated in the discussion. However, the results are not as expected, so I'm wondering if I did something wrong.

It could be nice that someone else try this approach to confirm my results :)

StephennFernandes

Mar 29, 2024

@andrespm hey any update on the same ? over the months is there any great solutions that converges the model to great wer on multiple languages ?

vizsatiz

Oct 30, 2024

Any updates on this thread would be helpful. I would also like to know also about how to improve the translation task along with transcribing
I was also wondering if this same multi-language approach works for LoRA fine-tuning. I tried LoRa and the language agnostic approach gave me lower accuracy