Output in English

#3
by importsmart - opened

I was hoping to use this model when I speak in hindi and then get the translated english output as its ability to understand hindi is better than non fine-tuned whisper.
Changing decoder_prompt_ids to (language="en", task="translate") had no affect and adding generate_kwargs = {"task":"translate", "language":"<|en|>"} during inference stopped any inference.
What should I try?

Since this model has been fine-tuned for 57000 steps on about 3500 hours of data which is quite substantial, the model must have been biased completely towards generating Hindi tokens (which in a way is desired in this ASR fine-tuning) and is therefore not doing well on the translation task.

To obtain the English translation of a Hindi audio, you may try the 3 methods listed below:

  1. Set decoder_prompt_ids to (language="hi", task="translate") and use the openai's original whisper model. The accuracy is quite low though, from Hindi->English.
  2. Using a speech translation model. There are not many good options though for Indian languages.
  3. Using a cascade system. Example, get the Hindi output from a good Automatic Speech Recognition (ASR) model (you could continue using this whisper-hindi-small for this). Pass the output of this to a machine translation model. The nllb from Facebook (https://huggingface.co/facebook/nllb-200-distilled-600M) is good enough from Hindi to English (has a 20-30% error). This according to me is a better option to get the English translation for a Hindi audio.

Thank you so much for the detailed reply. I'm looking to use it for a mixed "Hinglish" audio as well, openai's model stay better for that than the cascade system but the cascade works very well for pure hindi

importsmart changed discussion status to closed

Sign up or log in to comment