Difference in Transcription Quality Between Local Whisper Large V2 and Model Card Inference API

#103
by nkanaka1 - opened

I've recently started using OpenAI's Whisper for transcribing audio files, specifically using the whisper.load_model("large-v2") configuration in my local environment. I expected to achieve a high level of accuracy based on the model's reported capabilities.
However, I've noticed that the transcription results I get locally are significantly worse than those I get when using the model's inference API as showcased on the model's card.

Sign up or log in to comment