metadata
license: apache-2.0
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
model-index:
- name: MahmoudAshraf/acft-whisper-large-v3-turbo
results:
- task:
type: automatic-speech-recognition
dataset:
name: distil-whisper/earnings22
type: distil-whisper/earnings22
metrics:
- name: WER
type: WER
value: 15.605
Model Card
Model Description
This is in a fine-tuned series of OpenAI's Whisper models.
The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed in our GitHub repo.
- Developed by: Mahmoud Ashraf inspired by FUTO
- License: Apache-2.0
- Finetuned from model: OpenAI Whisper
Uses
These models are not useful by themselves under default Whisper runtime configurations.
The easiest way to test differing audio context is to use whisper.cpp with the --audio-context
parameter. We provide converted whisper.cpp models in our GitHub README.
Metrics
Speed was evaluated using TensorRT-LLM using In-flight Batching Dynamic context was padded with additional 128 context for stability
Model Name | WER on Earnings22 | Relative Speed |
---|---|---|
Large-V3 Full Context | 15.283 | 1.0x |
Large-V3 Dynamic Context | 17.515 | 2.1x |
MahmoudAshraf/acft-whisper-large-v3 | 15.381 | 2.1x |
Large-V3 Turbo Full Context | 15.373 | 1.9x |
Large-V3 Turbo Dynamic Context | 62.921 | 6.4x |
This Model | 15.605 | 5.1x |
Other Information
More information can be found in this GitHub README.