jordimas commited on
Commit
dfad97f
1 Parent(s): 802c1a9

Fixes to documentation

Browse files
Files changed (1) hide show
  1. TRAINING.md +3 -1
TRAINING.md CHANGED
@@ -20,7 +20,7 @@ The model improves in WER evaluation metric when it is evaluated against the Com
20
 
21
  **2. Model degrades according to human evaluation**
22
 
23
- When doing human evaliuation the results for finetuned Catalan language model were disapointing. The fine-tuned models clearly perform worse than the original OpenAI models as reported by all users (half dozen) that test them.
24
 
25
  Our hypothesis is that the evaluation on Common Voice gives better results because the model is overfitted and has lost generalization capabilities.
26
 
@@ -50,6 +50,8 @@ Summary as March 2023:
50
 
51
  **b**. HuggingFace Whisper implementation performs poorly. This can be really misleading when doing evaluations, since HuggingFace is the stack used for fine-tuning
52
 
 
 
53
  In our experiments
54
 
55
  | Whisper Client | WER |
 
20
 
21
  **2. Model degrades according to human evaluation**
22
 
23
+ When doing human evaluation the results for finetuned Catalan language model were disapointing. The fine-tuned models clearly perform worse than the original OpenAI models as reported by all users (half dozen) that test them.
24
 
25
  Our hypothesis is that the evaluation on Common Voice gives better results because the model is overfitted and has lost generalization capabilities.
26
 
 
50
 
51
  **b**. HuggingFace Whisper implementation performs poorly. This can be really misleading when doing evaluations, since HuggingFace is the stack used for fine-tuning
52
 
53
+ **c**. We have only been able to use the models reliable with Whisper.cpp and CTranslate 2 inference clients.
54
+
55
  In our experiments
56
 
57
  | Whisper Client | WER |