openai/whisper-large · Transcription

Nov 11, 2022

Can you run transcription via inference api? It only translates to english with the provided code snipped

Nov 12, 2022

You should be able to! The trick is to properly set the ‘forced_decoder_ids’ with the processor. Otherwise we are looking into adding the la language detection automatically

muertinho

Nov 20, 2022

@Spotex93 , where you able to find a solution to the transcription problem? I ran into the same question. Kind regards

sanchit-gandhi

Nov 21, 2022

•

edited Nov 24, 2022

You can set the forced_decoder_ids as follows:

from transformers import WhisperForConditionalGeneration, WhisperProcessor

checkpoint = "openai/whisper-large"

model = WhisperForConditionalGeneration.from_pretrained(checkpoint)
processor = WhisperProcessor.from_pretrained(checkpoint)

print("Default:")
print(model.config.forced_decoder_ids)
print(processor.batch_decode([i[1] for i in model.config.forced_decoder_ids]))

# now change to Hindi (hi)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="hi", task="transcribe")

print("\nHindi:")
print(model.config.forced_decoder_ids)
print(processor.batch_decode([i[1] for i in model.config.forced_decoder_ids]))

Print Output:

Default:
[[1, 50259], [2, 50359], [3, 50363]]
['<|en|>', '<|transcribe|>', '<|notimestamps|>']

Hindi:
[(1, 50276), (2, 50359), (3, 50363)]
['<|hi|>', '<|transcribe|>', '<|notimestamps|>']

Spotex93

Nov 21, 2022

@muertinho for now I switched to https://replicate.com/openai/whisper/api#input-audio since to me it seems that it is not possible with the currently provided inference api. Maybe you'd have to host your own.

muertinho

Nov 22, 2022

@Spotex93 thank you for providing me with the link. I see the same issue you mentioned above and will look into replicate.com. @sanchit-gandhi thank you for your reply and efforts as well. However, I am looking for an option to run it over a cloud API.

Spotex93

Nov 23, 2022

@muertinho tell me if you found a better solution!