softcatala
/

whisper-small-ca

Automatic Speech Recognition

Generated from Trainer

hf-asr-leaderboard

Model card Files Files and versions Metrics Training metrics Community

jordimas commited on Apr 1, 2023

Commit

802c1a9

·

1 Parent(s): 69769f1

Fixes to documentation

Files changed (1) hide show

TRAINING.md +4 -5

TRAINING.md CHANGED Viewed

@@ -24,9 +24,9 @@ When doing human evaliuation the results for finetuned Catalan language model we
 Our hypothesis is that the evaluation on Common Voice gives better results because the model is overfitted and has lost generalization capabilities.
-**2. Model degrades according evaluation with other datasets**
-Doing a more extensive evaluation shows:
 |             | base        | sc-base     | small       | sc-small   |medium       | sc-medium   |
 | ----------- | ----------- | ----------- | ----------- |----------- | ----------- | ----------- |
@@ -35,7 +35,7 @@ Doing a more extensive evaluation shows:
 | Son_Goku_catalan_valencian_voice      | 51.90           | 85.44           | 39.87           |65.19           | 18.99| 71.52
 | Universal_Declaration_of_Human_Rights      | 47.12           | 36.45           | 39.14          |75.59           | 44.37           | 27.79
-As you can see,
 Legend:
 * "sc-" Indicates Softcatalà fine-tuned model
@@ -56,12 +56,11 @@ In our experiments
 | ----------- | ----------- |
 | OpenAI      | 27.32       |
 | Whisper.cpp 1.2.1   | 38.89 |
-| HuggingFace   | 93.54  |
 | CTranslate2 3.10.3  | 43.68  |
 We strongly recommend using CTranslate2 as inference client.
 **5. Fine-tunning degrades timestamp prediction**
 Whisper uses timestamp tokens to indicate the timestamps of the transcribed texts.

 Our hypothesis is that the evaluation on Common Voice gives better results because the model is overfitted and has lost generalization capabilities.
+**3. Model degrades according evaluation with other datasets**
+Results doing an evaluation with other datasets:
 |             | base        | sc-base     | small       | sc-small   |medium       | sc-medium   |
 | ----------- | ----------- | ----------- | ----------- |----------- | ----------- | ----------- |
 | Son_Goku_catalan_valencian_voice      | 51.90           | 85.44           | 39.87           |65.19           | 18.99| 71.52
 | Universal_Declaration_of_Human_Rights      | 47.12           | 36.45           | 39.14          |75.59           | 44.37           | 27.79
+As you can see, the fine-tunned models perform worse in most of the scenarios than OpenAI models.
 Legend:
 * "sc-" Indicates Softcatalà fine-tuned model
 | ----------- | ----------- |
 | OpenAI      | 27.32       |
 | Whisper.cpp 1.2.1   | 38.89 |
+| HuggingFace 4.27.1  | 93.54  |
 | CTranslate2 3.10.3  | 43.68  |
 We strongly recommend using CTranslate2 as inference client.
 **5. Fine-tunning degrades timestamp prediction**
 Whisper uses timestamp tokens to indicate the timestamps of the transcribed texts.