aotrih commited on
Commit
64d375d
•
1 Parent(s): 7becd73

whisperkittools generated README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -28
README.md CHANGED
@@ -9,42 +9,42 @@ tags:
9
  - coreml
10
  - asr
11
  - quantized
12
- - automatic-speech-recognition
13
- inference: false
14
  ---
15
  # WhisperKit Evaluation Results
16
 
17
 
18
 
19
  ## Dataset: `librispeech`
20
-
21
- | | WER | QoI (%) | File Size (MB) |
22
- |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|----------:|-----------------:|
23
- | [WhisperOpenAIAPI/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech) | 2.85 | 100 | 3100 |
24
- | [WhisperKit/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech) | 2.48 | 95.2 | 3100 |
25
- | [WhisperKit/openai_whisper-large-v3_turbo](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech) | 2.44 | 95.4 | 3100 |
26
- | [WhisperKit/openai_whisper-large-v3_turbo_1018MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_1018MB/librispeech) | 2.49 | 94.8 | 1018 |
27
- | [WhisperKit/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech) | 3.28 | 96.6 | 3100 |
28
- | [WhisperKit/openai_whisper-large-v2_1050MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_1050MB/librispeech) | 3.32 | 95 | 1050 |
29
- | [WhisperKit/openai_whisper-large-v2_turbo](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech) | 3.24 | 96.6 | 3100 |
30
- | [WhisperKit/openai_whisper-large-v2_turbo_1022MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_1022MB/librispeech) | 3.33 | 94.9 | 1022 |
31
- | [WhisperKit/openai_whisper-small.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech) | 4.14 | 85.8 | 483 |
32
- | [WhisperKit/openai_whisper-small](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech) | 4.03 | 83 | 483 |
33
- | [WhisperKit/openai_whisper-base.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech) | 4.79 | 75.3 | 145 |
34
- | [WhisperKit/openai_whisper-base](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 6.14 | 67.2 | 145 |
35
- | [WhisperKit/openai_whisper-tiny.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 6.76 | 63.9 | 66 |
36
- | [WhisperKit/openai_whisper-tiny](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 8.91 | 52.5 | 66 |
37
- | [whisper.cpp/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/librispeech) | 2.35 | 95.4 | 3100 |
 
38
 
39
  ## Dataset: `earnings22`
 
40
 
41
- | | WER | QoI (%) | File Size (MB) |
42
- |:------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|----------:|-----------------:|
43
- | [WhisperOpenAIAPI/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 17.08 | 100 | 3100 |
44
- | [WhisperKit/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 15.91 | 58.5 | 3100 |
45
- | [WhisperKit/openai_whisper-base.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 24.16 | 6.5 | 145 |
46
- | [WhisperKit/openai_whisper-tiny.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 29.36 | 5.7 | 66 |
47
- | [whisper.cpp/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/earnings22) | 34.15 | 6.5 | 3100 |
48
 
49
 
50
  We believe that rigorously measuring the quality of inference is necessary for developers and
@@ -53,12 +53,14 @@ any machine learning model in production. To contextualize `WhisperKit`, we take
53
  implementations and benchmark them using a consistent evaluation harness:
54
 
55
  Server-side:
56
- - `WhisperOpenAIAPI`: [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text) ($0.36 per hour of audio as of 02/29/24, 25MB file size limit per request)
 
57
 
58
  On-device:
59
  - `WhisperKit`: Argmax's implementation [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L100) [[Repo]](https://github.com/argmaxinc/WhisperKit)
60
  - `whisper.cpp`: A C++ implementation form ggerganov [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L212) [[Repo]](https://github.com/ggerganov/whisper.cpp)
61
  - `WhisperMLX`: A Python implementation from Apple MLX [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L338) [[Repo]](https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py)
 
62
 
63
  `WhisperOpenAIAPI` sets the reference and we assume that it is using the equivalent of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)
64
  in float16 precision along with additional undisclosed optimizations from OpenAI. In all measurements, we care primarily about per-example no-regressions (quantified as `qoi` below)
 
9
  - coreml
10
  - asr
11
  - quantized
 
 
12
  ---
13
  # WhisperKit Evaluation Results
14
 
15
 
16
 
17
  ## Dataset: `librispeech`
18
+ (Short-form Audio (<30s/clip) - 5 hours of English audiobook clips)
19
+
20
+ | | WER (↓) | QoI (↑) | File Size (MB) |
21
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------:|----------:|-----------------:|
22
+ | [WhisperOpenAIAPI/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech) | 2.35 | 100 | 3100 |
23
+ | [WhisperKit/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech) | 2.04 | 95.2 | 3100 |
24
+ | [WhisperKit/openai_whisper-large-v3_turbo](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech) | 2.03 | 95.4 | 3100 |
25
+ | [WhisperKit/openai_whisper-large-v3_turbo_1018MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_1018MB/librispeech) | 1.99 | 94.8 | 1018 |
26
+ | [WhisperKit/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech) | 2.77 | 96.6 | 3100 |
27
+ | [WhisperKit/openai_whisper-large-v2_1050MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_1050MB/librispeech) | 2.81 | 95 | 1050 |
28
+ | [WhisperKit/openai_whisper-large-v2_turbo](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech) | 2.76 | 96.6 | 3100 |
29
+ | [WhisperKit/openai_whisper-large-v2_turbo_1022MB](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_1022MB/librispeech) | 2.66 | 94.9 | 1022 |
30
+ | [WhisperKit/openai_whisper-small.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech) | 3.12 | 85.8 | 483 |
31
+ | [WhisperKit/openai_whisper-small](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech) | 3.45 | 83 | 483 |
32
+ | [WhisperKit/openai_whisper-base.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech) | 3.98 | 75.3 | 145 |
33
+ | [WhisperKit/openai_whisper-base](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 4.97 | 67.2 | 145 |
34
+ | [WhisperKit/openai_whisper-tiny.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 5.61 | 63.9 | 66 |
35
+ | [WhisperKit/openai_whisper-tiny](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 7.47 | 52.5 | 66 |
36
+ | [whisper.cpp/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/librispeech) | 1.97 | 95.4 | 3100 |
37
 
38
  ## Dataset: `earnings22`
39
+ (Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents)
40
 
41
+ | | WER (↓) | QoI (↑) | File Size (MB) |
42
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------|----------:|----------:|-----------------:|
43
+ | [WhisperOpenAIAPI/openai_whisper-large-v2](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 16.27 | 100 | 3100 |
44
+ | [WhisperKit/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 15.17 | 58.5 | 3100 |
45
+ | [WhisperKit/openai_whisper-base.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 23.49 | 6.5 | 145 |
46
+ | [WhisperKit/openai_whisper-tiny.en](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 28.64 | 5.7 | 66 |
47
+ | [whisper.cpp/openai_whisper-large-v3](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/whisper.cpp/openai_whisper-large-v3/earnings22) | 33.58 | 6.5 | 3100 |
48
 
49
 
50
  We believe that rigorously measuring the quality of inference is necessary for developers and
 
53
  implementations and benchmark them using a consistent evaluation harness:
54
 
55
  Server-side:
56
+ - `WhisperOpenAIAPI`: [OpenAI's Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
57
+ ($0.36 per hour of audio as of 02/29/24, 25MB file size limit per request)
58
 
59
  On-device:
60
  - `WhisperKit`: Argmax's implementation [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L100) [[Repo]](https://github.com/argmaxinc/WhisperKit)
61
  - `whisper.cpp`: A C++ implementation form ggerganov [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L212) [[Repo]](https://github.com/ggerganov/whisper.cpp)
62
  - `WhisperMLX`: A Python implementation from Apple MLX [[Eval Harness]](https://github.com/argmaxinc/whisperkittools/blob/main/whisperkit/pipelines.py#L338) [[Repo]](https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py)
63
+ (All on-device implementations are available for free under MIT license as of 03/19/2024)
64
 
65
  `WhisperOpenAIAPI` sets the reference and we assume that it is using the equivalent of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)
66
  in float16 precision along with additional undisclosed optimizations from OpenAI. In all measurements, we care primarily about per-example no-regressions (quantified as `qoi` below)