arda-argmax
commited on
Commit
•
7c9dbc2
1
Parent(s):
5be6e63
Update README.md
Browse files
README.md
CHANGED
@@ -37,6 +37,8 @@ Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
|
|
37 |
| [base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base) | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 67.2 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
38 |
| [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 63.9 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
39 |
| [tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny) | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 52.5 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
|
|
|
|
40 |
|
41 |
## Dataset: `earnings22`
|
42 |
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
|
@@ -49,6 +51,15 @@ Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English w
|
|
49 |
| [base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
|
50 |
| [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
### Explanation
|
54 |
|
@@ -99,6 +110,7 @@ WhisperKit is an SDK for building speech-to-text features in apps across a wide
|
|
99 |
### Datasets
|
100 |
- [librispeech](https://huggingface.co/datasets/argmaxinc/librispeech): ~5 hours of short English audio clips, tests short-form transcription quality
|
101 |
- [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
|
|
|
102 |
|
103 |
### Reproducing Results
|
104 |
Benchmark results on this page were automatically generated by [whisperkittools](https://github.com/argmaxinc/whisperkittools) using our cluster of Apple Silicon Macs as self-hosted runners on
|
|
|
37 |
| [base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base) | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 67.2 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
38 |
| [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 63.9 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
39 |
| [tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny) | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 52.5 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
|
40 |
+
| [large-v3-v20240930](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930) | [1.94](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/librispeech) | 93.9 | 1640 | [Link](https://github.com/argmaxinc/WhisperKit/commit/c2f1b57) |
|
41 |
+
| [large-v3-v20240930_626MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930_626MB) | [1.95](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3-v20240930_626MB/librispeech) | 93.8 | 626 | [Link](https://github.com/argmaxinc/WhisperKit/commit/3cd3ef1) |
|
42 |
|
43 |
## Dataset: `earnings22`
|
44 |
Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
|
|
|
51 |
| [base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
|
52 |
| [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
|
53 |
|
54 |
+
## Dataset: `common_voice_17_0-argmax_subset-400`
|
55 |
+
Short-form Audio (<30s/clip) - Max 400 samples per language from Common Voice 17.0 Test Set
|
56 |
+
|
57 |
+
| | es | ro | th | nl | id | sv | de | pl | fi | it | cs | en | vi | el | hu | ru | gl | fr | pt | da | File Size (MB) | Code Commit |
|
58 |
+
|:-------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------:|:------------------------------------------------------------|
|
59 |
+
| [large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [4.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/es) | [5.39](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/ro) | [6.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/th) | [7.03](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/nl) | [9.47](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/id) | [9.81](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/sv) | [9.89](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/de) | [10.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/pl) | [10.32](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/fi) | [11.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/it) | [12.04](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/cs) | [12.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/en) | [12.32](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/vi) | [12.35](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/el) | [12.44](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/hu) | [13.0](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/ru) | [13.06](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/gl) | [13.67](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/fr) | [13.75](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/pt) | [13.89](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3/common_voice_17_0-argmax_subset-400/forced/da) | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
|
60 |
+
| [large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2) | [6.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/es) | [7.86](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/ro) | [8.76](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/th) | [8.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/nl) | [12.2](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/id) | [12.16](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/sv) | [11.7](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/de) | [12.51](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/pl) | [13.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/fi) | [14.34](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/it) | [17.14](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/cs) | [12.7](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/en) | [17.69](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/vi) | [15.04](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/el) | [16.72](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/hu) | [15.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/ru) | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/gl) | [16.21](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/fr) | [15.23](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/pt) | [16.72](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v2/common_voice_17_0-argmax_subset-400/forced/da) | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
|
61 |
+
| [large-v3-v20240930](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3-v20240930) | [6.1](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/es) | [11.41](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/ro) | [23.3](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/th) | [8.91](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/nl) | [11.11](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/id) | [12.97](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/sv) | [12.26](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/de) | [12.12](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/pl) | [15.42](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/fi) | [12.83](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/it) | [12.85](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/cs) | [12.13](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/en) | [16.92](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/vi) | [17.73](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/el) | [15.3](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/hu) | [13.28](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/ru) | [15.0](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/gl) | [15.51](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/fr) | [14.93](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/pt) | [17.63](https://hf.co/datasets/argmaxinc/whisperkit-evals-multilingual/tree/main/WhisperKit/openai_whisper-large-v3-v20240930/common_voice_17_0-argmax_subset-400/forced/da) | 1640 | [Link](https://github.com/argmaxinc/WhisperKit/commit/e3e21d4) |
|
62 |
+
|
63 |
|
64 |
### Explanation
|
65 |
|
|
|
110 |
### Datasets
|
111 |
- [librispeech](https://huggingface.co/datasets/argmaxinc/librispeech): ~5 hours of short English audio clips, tests short-form transcription quality
|
112 |
- [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
|
113 |
+
- [common_voice_17_0-argmax_subset-400](https://huggingface.co/datasets/argmaxinc/common_voice_17_0-argmax_subset-400): Up to 400 samples per language of multilingual audio clips with corresponding text, testing transcription quality across diverse languages and accents.
|
114 |
|
115 |
### Reproducing Results
|
116 |
Benchmark results on this page were automatically generated by [whisperkittools](https://github.com/argmaxinc/whisperkittools) using our cluster of Apple Silicon Macs as self-hosted runners on
|