eustlb
/

distil-large-v3-fr

@@ -83,11 +83,11 @@ The result is a distilled model that performs within **2% WER of [Whisper large-
 | Model                  | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
 | :--------------------- | :--------: | :----------: | :------------: | :-----------: |
-| whisper-tiny           |    37.8    |     4.7      |     43.24      |     28.28     |
-| whisper-base           |    72.6    |     3.7      |     30.48      |     19.23     |
-| whisper-small          |    242     |     2.3      |     16.36      |     12.47     |
-| whisper-medium         |    764     |     1.3      |     11.53      |     10.77     |
-| whisper-large-v3       |    1540    |     1.0      |      7.84      |     9.07      |
 | **distil-large-v3-fr** |  **756**   |   **5.9**    |    **9.34**    |   **11.13**   |
 *latencies benchmarked to generate 128 tokens on A100 40GB with a batch size of 1. More details about inference performances in [inference speed](#inference-speed) section.
@@ -618,14 +618,15 @@ The model has been tested for both in-distribution (Common Voice 17 and Multilin
 ### Short-Form
-|     Model Name     |   RTFx   | Common Voice 17 | Multilingual Librispeech | Voxpopuli | Fleurs |
-| :----------------: | :-----: | :-------------: | :----------------------: | :-------: | :----: |
-|    whisper-tiny    | 280.576 |     56.757      |          37.512          |  32.505   | 46.173 |
-|    whisper-base    | 261.235 |     42.447      |           25.2           |  26.434   | 27.851 |
-|   whisper-small    | 249.676 |     22.469      |          14.097          |   14.61   | 14.283 |
-|   whisper-medium   |  170.9  |     15.432      |          9.602           |   11.92   | 9.155  |
-|  whisper-large-v3  | 150.719 |     11.024      |          4.783           |   9.948   | 5.624  |
-| distil-large-v3-fr | 310.127 |     12.681      |          5.865           |  10.851   | 7.984  |
 *the above datasets correspond to test splits
@@ -633,14 +634,15 @@ The model has been tested for both in-distribution (Common Voice 17 and Multilin
 ### Long-Form
-|     Model Name     |   RTFx   | [long-form test set](https://huggingface.co/datasets/eustlb/french-long-form-test) |
 | :----------------: | :-----: | :--------------------------------------------------------------------------------: |
-|    whisper-tiny    | 125.367 |                                       28.277                                       |
-|    whisper-base    | 110.139 |                                       19.228                                       |
-|   whisper-small    | 83.417  |                                       12.467                                       |
-|   whisper-medium   | 56.677  |                                       10.772                                       |
-|  whisper-large-v3  | 41.805  |                                       9.073                                        |
-| distil-large-v3-fr | 169.692 |                                       11.385                                       |

 | Model                  | Params (M) | Rel. Latency | Short-Form WER | Long-Form WER |
 | :--------------------- | :--------: | :----------: | :------------: | :-----------: |
+| whisper-tiny           |    37.8    |     4.7      |     43.73      |     28.158    |
+| whisper-base           |    72.6    |     3.7      |     30.57      |     18.665    |
+| whisper-small          |    242     |     2.3      |     16.20      |     12.557    |
+| whisper-medium         |    764     |     1.3      |     11.720     |     11.023    |
+| whisper-large-v3       |    1540    |     1.0      |      7.81      |      9.008    |
 | **distil-large-v3-fr** |  **756**   |   **5.9**    |    **9.34**    |   **11.13**   |
 *latencies benchmarked to generate 128 tokens on A100 40GB with a batch size of 1. More details about inference performances in [inference speed](#inference-speed) section.
 ### Short-Form
+| Model                  | Common Voice 17 | Multilingual Librispeech | voxpopuli  |  fleurs   |    RTFx     |
+| :--------------------- | :-------------: | :----------------------: | :--------: | :-------: | :---------: |
+| whisper-tiny           |     57.141      |          38.049          |   32.346   |   47.4    |   265.226   |
+| whisper-base           |      42.58      |          25.235          |   26.701   |  27.773   |   237.195   |
+| whisper-small          |      22.56      |          13.576          |   14.486   |  14.165   |   196.932   |
+| whisper-medium         |      15.51      |          9.541           |   11.836   |   9.992   |   93.428    |
+| whisper-large-v3       |     11.038      |          4.762           |    9.83    |   5.624   |   62.845    |
+| **distil-large-v3-fr** |   **12.675**    |        **5.865**         | **10.832** | **7.989** | **106.291** |
 *the above datasets correspond to test splits
 ### Long-Form
+|     Model Name     |  RTFx   | [long-form test set](https://huggingface.co/datasets/eustlb/french-long-form-test) |
 | :----------------: | :-----: | :--------------------------------------------------------------------------------: |
+|    whisper-tiny    | 121.389 |                                       28.158                                       |
+|    whisper-base    | 109.366 |                                       18.665                                       |
+|   whisper-small    | 83.049  |                                       12.557                                       |
+|   whisper-medium   | 47.807  |                                       11.023                                       |
+|  whisper-large-v3  | 38.294  |                                       9.008                                        |
+| distil-large-v3-fr | 101.326 |                                       11.13                                        |