Not getting english transcription from the model

by tusharagarwal3 - opened Apr 28, 2023

Apr 28, 2023

Hi,

I am trying to get hindi audio to english text transcription using whisper-hindi-large-v2 model. even when specifying the language as "en" (also tried "english"), the model still doesn't return the output in english. It is returning in some indic language, most likely hindi.

Please let me know if I am doing something wrong.

tusharagarwal3

Apr 28, 2023

Found this on their github: https://github.com/huggingface/transformers/issues/21937.

Any workaround for this?

I am new to exploring this, so might be missing something.

tusharagarwal3

May 2, 2023

Also, if you can help me accessing the model embeddings as I need them for downstream tasks.

vasista22

Owner May 6, 2023

Also, if you can help me accessing the model embeddings as I need them for downstream tasks.

Hi @tusharagarwal3 ,

Detailed information regarding extraction of various layer embeddings has been provided here (https://github.com/vasistalodagala/whisper-finetune#extract-embeddings-from-whisper-models).
Hope that helps.

vasista22

Owner May 6, 2023

Hi,

I am trying to get hindi audio to english text transcription using whisper-hindi-large-v2 model. even when specifying the language as "en" (also tried "english"), the model still doesn't return the output in english. It is returning in some indic language, most likely hindi.

Please let me know if I am doing something wrong.

This being an Automatic Speech Recognition (ASR) model, passing Hindi audio as input would give the transcription in Hindi.
Since this model has been fine-tuned for 57000 steps on about 3500 hours of data which is quite substantial, the model must have been biased completely towards generating Hindi tokens (which in a way is desired in this ASR fine-tuning) and is therefore not doing well on the translation task.

To obtain the English translation of a Hindi audio, you may try the 3 methods listed below:

Set decoder_prompt_ids to (language="hi", task="translate") and use the openai's original whisper model. The accuracy is quite low though, from Hindi->English.
Using a speech translation model. There are not many good options though for Indian languages.
Using a cascade system. Example, get the Hindi output from a good ASR model (you could continue using this whisper-hindi-large-v2 for this). Pass the output of this to a machine translation model. The nllb from Facebook (https://huggingface.co/facebook/nllb-200-distilled-600M) is good enough from Hindi to English (has a 20-30% error). This according to me is a better option to get the English translation for a Hindi audio.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment