--- license: apache-2.0 base_model: - openai/whisper-large-v3-turbo pipeline_tag: automatic-speech-recognition model-index: - name: MahmoudAshraf/acft-whisper-large-v3-turbo results: - task: type: automatic-speech-recognition dataset: name: distil-whisper/earnings22 type: distil-whisper/earnings22 metrics: - name: WER type: WER value: 15.605 --- # Model Card ## Model Description This is in a fine-tuned series of [OpenAI's Whisper models](https://github.com/openai/whisper). The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed [in our GitHub repo](https://github.com/futo-org/whisper-acft). - **Developed by:** Mahmoud Ashraf inspired by FUTO - **License:** Apache-2.0 - **Finetuned from model:** OpenAI Whisper ## Uses These models are not useful by themselves under default Whisper runtime configurations. The easiest way to test differing audio context is to use whisper.cpp with the `--audio-context` parameter. We provide converted whisper.cpp models in our [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness). ## Metrics Speed was evaluated using TensorRT-LLM using In-flight Batching Dynamic context was padded with additional 128 context for stability | Model Name | WER on Earnings22 | Relative Speed | |------------------------------------------------------------------|--------|----------------| | Large-V3 Full Context | 15.283 | 1.0x | | Large-V3 Dynamic Context | 17.515 | 2.1x | | [MahmoudAshraf/acft-whisper-large-v3](https://huggingface.co/MahmoudAshraf/acft-whisper-large-v3) | 15.381 | 2.1x | | Large-V3 Turbo Full Context | 15.373 | 1.9x | | Large-V3 Turbo Dynamic Context | 62.921 | 6.4x | | This Model | 15.605 | 5.1x | ## Other Information More information can be found in this [GitHub README](https://github.com/futo-org/whisper-acft?tab=readme-ov-file#finetuning-whisper-for-dynamic-audio-context-robustness).