How can I detect the language of the audio by loading the model named WhisperForConditionalGeneration?

#40
by lnpwcd68730 - opened

This is how I expect to load the model

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")

This is the language detection method mentioned in the README of whisper

import whisper
model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

sanchit-gandhi changed discussion status to closed

Sign up or log in to comment