MahmoudAshraf's picture
Update README.md
6f95623 verified
metadata
license: apache-2.0
base_model:
  - openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
model-index:
  - name: MahmoudAshraf/acft-whisper-large-v3-turbo
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          name: distil-whisper/earnings22
          type: distil-whisper/earnings22
        metrics:
          - name: WER
            type: WER
            value: 15.605

Model Card

Model Description

This is in a fine-tuned series of OpenAI's Whisper models.

The models have been finetuned for dynamic audio context robustness, allowing shorter audio contexts for better performance with short audio inputs. The method is detailed in our GitHub repo.

  • Developed by: Mahmoud Ashraf inspired by FUTO
  • License: Apache-2.0
  • Finetuned from model: OpenAI Whisper

Uses

These models are not useful by themselves under default Whisper runtime configurations.

The easiest way to test differing audio context is to use whisper.cpp with the --audio-context parameter. We provide converted whisper.cpp models in our GitHub README.

Metrics

Speed was evaluated using TensorRT-LLM using In-flight Batching Dynamic context was padded with additional 128 context for stability

Model Name WER on Earnings22 Relative Speed
Large-V3 Full Context 15.283 1.0x
Large-V3 Dynamic Context 17.515 2.1x
MahmoudAshraf/acft-whisper-large-v3 15.381 2.1x
Large-V3 Turbo Full Context 15.373 1.9x
Large-V3 Turbo Dynamic Context 62.921 6.4x
This Model 15.605 5.1x

Other Information

More information can be found in this GitHub README.