Fine-tuning Whisper in more than one language

#90
by andrespm - opened

Suppose I have a dataset in two or more languages (one of them under-represented in Whisper's pre-trained models), and I want to fine-tune those 2 or more languages to continue with a multilingual model and avoid catastrophic forgetting. Is fine-tuning possible?

Can I define the tokenizer and the processor without indicating the language?

andrespm changed discussion status to closed

Hi! I am looking at a similar scenario. In case you managed to find a solution, would you be able to share it? :)

andrespm changed discussion status to open

Hi @bmichele !
I've found some sort of solution on this thread:
https://huggingface.co/spaces/openai/whisper/discussions/6#643d8bc551e2958ef6cd69ef

However, I'm still wondering which is the best strategy:

  • I've tried fine-tuning sequentially, by the results get worst on each fine tune cycle.
  • I've tried fine-tuning in a multilingual way, avoiding the "lang" label in the tokenizer and in the processor, relying on the Whisper ability to detect the language, and the results are promising.
  • Finally, I've tried fine-tuning in a multilingual way, indicating the lang label as is indicated in the discussion. However, the results are not as expected, so I'm wondering if I did something wrong.

It could be nice that someone else try this approach to confirm my results :)

@andrespm hey any update on the same ? over the months is there any great solutions that converges the model to great wer on multiple languages ?

Any updates on this thread would be helpful. I would also like to know also about how to improve the translation task along with transcribing
I was also wondering if this same multi-language approach works for LoRA fine-tuning. I tried LoRa and the language agnostic approach gave me lower accuracy

Sign up or log in to comment