trip-fontaine
commited on
Commit
•
85e88e4
1
Parent(s):
561b47e
update readme
Browse files
README.md
CHANGED
@@ -83,11 +83,11 @@ The result is a distilled model that performs within **2% WER of [Whisper large-
|
|
83 |
|
84 |
| Model | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
|
85 |
| :--------------------- | :--------: | :----------: | :------------: | :-----------: |
|
86 |
-
| whisper-tiny | 37.8 | 4.7 | 43.
|
87 |
-
| whisper-base | 72.6 | 3.7 | 30.
|
88 |
-
| whisper-small | 242 | 2.3 | 16.
|
89 |
-
| whisper-medium | 764 | 1.3 | 11.
|
90 |
-
| whisper-large-v3 | 1540 | 1.0 | 7.
|
91 |
| **distil-large-v3-fr** | **756** | **5.9** | **9.34** | **11.13** |
|
92 |
|
93 |
*latencies benchmarked to generate 128 tokens on A100 40GB with a batch size of 1. More details about inference performances in [inference speed](#inference-speed) section.
|
@@ -618,14 +618,15 @@ The model has been tested for both in-distribution (Common Voice 17 and Multilin
|
|
618 |
|
619 |
### Short-Form
|
620 |
|
621 |
-
|
|
622 |
-
|
|
623 |
-
|
|
624 |
-
|
|
625 |
-
|
|
626 |
-
|
|
627 |
-
|
|
628 |
-
| distil-large-v3-fr |
|
|
|
629 |
|
630 |
*the above datasets correspond to test splits
|
631 |
|
@@ -633,14 +634,15 @@ The model has been tested for both in-distribution (Common Voice 17 and Multilin
|
|
633 |
### Long-Form
|
634 |
|
635 |
|
636 |
-
| Model Name |
|
637 |
| :----------------: | :-----: | :--------------------------------------------------------------------------------: |
|
638 |
-
| whisper-tiny |
|
639 |
-
| whisper-base |
|
640 |
-
| whisper-small | 83.
|
641 |
-
| whisper-medium |
|
642 |
-
| whisper-large-v3 |
|
643 |
-
| distil-large-v3-fr |
|
|
|
644 |
|
645 |
|
646 |
|
|
|
83 |
|
84 |
| Model | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
|
85 |
| :--------------------- | :--------: | :----------: | :------------: | :-----------: |
|
86 |
+
| whisper-tiny | 37.8 | 4.7 | 43.73 | 28.158 |
|
87 |
+
| whisper-base | 72.6 | 3.7 | 30.57 | 18.665 |
|
88 |
+
| whisper-small | 242 | 2.3 | 16.20 | 12.557 |
|
89 |
+
| whisper-medium | 764 | 1.3 | 11.720 | 11.023 |
|
90 |
+
| whisper-large-v3 | 1540 | 1.0 | 7.81 | 9.008 |
|
91 |
| **distil-large-v3-fr** | **756** | **5.9** | **9.34** | **11.13** |
|
92 |
|
93 |
*latencies benchmarked to generate 128 tokens on A100 40GB with a batch size of 1. More details about inference performances in [inference speed](#inference-speed) section.
|
|
|
618 |
|
619 |
### Short-Form
|
620 |
|
621 |
+
| Model | Common Voice 17 | Multilingual Librispeech | voxpopuli | fleurs | RTFx |
|
622 |
+
| :--------------------- | :-------------: | :----------------------: | :--------: | :-------: | :---------: |
|
623 |
+
| whisper-tiny | 57.141 | 38.049 | 32.346 | 47.4 | 265.226 |
|
624 |
+
| whisper-base | 42.58 | 25.235 | 26.701 | 27.773 | 237.195 |
|
625 |
+
| whisper-small | 22.56 | 13.576 | 14.486 | 14.165 | 196.932 |
|
626 |
+
| whisper-medium | 15.51 | 9.541 | 11.836 | 9.992 | 93.428 |
|
627 |
+
| whisper-large-v3 | 11.038 | 4.762 | 9.83 | 5.624 | 62.845 |
|
628 |
+
| **distil-large-v3-fr** | **12.675** | **5.865** | **10.832** | **7.989** | **106.291** |
|
629 |
+
|
630 |
|
631 |
*the above datasets correspond to test splits
|
632 |
|
|
|
634 |
### Long-Form
|
635 |
|
636 |
|
637 |
+
| Model Name | RTFx | [long-form test set](https://huggingface.co/datasets/eustlb/french-long-form-test) |
|
638 |
| :----------------: | :-----: | :--------------------------------------------------------------------------------: |
|
639 |
+
| whisper-tiny | 121.389 | 28.158 |
|
640 |
+
| whisper-base | 109.366 | 18.665 |
|
641 |
+
| whisper-small | 83.049 | 12.557 |
|
642 |
+
| whisper-medium | 47.807 | 11.023 |
|
643 |
+
| whisper-large-v3 | 38.294 | 9.008 |
|
644 |
+
| distil-large-v3-fr | 101.326 | 11.13 |
|
645 |
+
|
646 |
|
647 |
|
648 |
|