Transcription

#11
by Spotex93 - opened

Can you run transcription via inference api? It only translates to english with the provided code snipped

You should be able to! The trick is to properly set the ‘forced_decoder_ids’ with the processor. Otherwise we are looking into adding the la language detection automatically

@Spotex93 , where you able to find a solution to the transcription problem? I ran into the same question. Kind regards

You can set the forced_decoder_ids as follows:

from transformers import WhisperForConditionalGeneration, WhisperProcessor

checkpoint = "openai/whisper-large"

model = WhisperForConditionalGeneration.from_pretrained(checkpoint)
processor = WhisperProcessor.from_pretrained(checkpoint)

print("Default:")
print(model.config.forced_decoder_ids)
print(processor.batch_decode([i[1] for i in model.config.forced_decoder_ids]))

# now change to Hindi (hi)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="hi", task="transcribe")

print("\nHindi:")
print(model.config.forced_decoder_ids)
print(processor.batch_decode([i[1] for i in model.config.forced_decoder_ids]))

Print Output:

Default:
[[1, 50259], [2, 50359], [3, 50363]]
['<|en|>', '<|transcribe|>', '<|notimestamps|>']

Hindi:
[(1, 50276), (2, 50359), (3, 50363)]
['<|hi|>', '<|transcribe|>', '<|notimestamps|>']

@muertinho for now I switched to https://replicate.com/openai/whisper/api#input-audio since to me it seems that it is not possible with the currently provided inference api. Maybe you'd have to host your own.

@Spotex93 thank you for providing me with the link. I see the same issue you mentioned above and will look into replicate.com. @sanchit-gandhi thank you for your reply and efforts as well. However, I am looking for an option to run it over a cloud API.

@muertinho tell me if you found a better solution!

Sign up or log in to comment