aapot commited on
Commit
b404725
1 Parent(s): 198b426

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -16
README.md CHANGED
@@ -30,9 +30,23 @@ model-index:
30
  - name: Test CER
31
  type: cer
32
  value: 0.88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
 
35
- # Wav2Vec2 XLS-R for Finnish ASR
36
 
37
  This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) for Finnish ASR. The model has been fine-tuned with 275.6 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
38
  [this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
@@ -184,7 +198,9 @@ The pretrained `facebook/wav2vec2-xls-r-1b` model was initialized with following
184
 
185
  ## Evaluation results
186
 
187
- Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) and with the [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0). This model's training data includes the training splits of Common Voice 7.0 but our newest `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` model includes the Common Voice 9.0 so we ran tests for both versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
 
 
188
 
189
  ### Common Voice 7.0 testing
190
 
@@ -194,14 +210,15 @@ To evaluate this model, run the `eval.py` script in this repository:
194
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
195
  ```
196
 
197
- This model (the first row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
198
 
199
- | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
200
- |----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
201
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**4.09** |**9.73** |**0.88** |**1.65** |
202
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.65 |13.11 |1.20 |2.23 |
203
- |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.85 |13.52 |1.35 |2.44 |
204
- |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |8.16 |17.92 |1.97 |3.36 |
 
205
 
206
  ### Common Voice 9.0 testing
207
 
@@ -211,14 +228,33 @@ To evaluate this model, run the `eval.py` script in this repository:
211
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
212
  ```
213
 
214
- This model (the first row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
 
216
- | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
217
- |----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
218
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**3.72** |**8.96** |**0.80** |**1.52** |
219
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.35 |13.00 |1.14 |2.20 |
220
- |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.93 |14.08 |1.40 |2.59 |
221
- |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |7.42 |16.45 |1.79 |3.07 |
 
222
 
223
  ## Team Members
224
 
 
30
  - name: Test CER
31
  type: cer
32
  value: 0.88
33
+ - task:
34
+ name: Automatic Speech Recognition
35
+ type: automatic-speech-recognition
36
+ dataset:
37
+ name: FLEURS ASR
38
+ type: google/fleurs
39
+ args: fi_fi
40
+ metrics:
41
+ - name: Test WER
42
+ type: wer
43
+ value: 12.11
44
+ - name: Test CER
45
+ type: cer
46
+ value: 5.65
47
  ---
48
 
49
+ # Wav2vec2-xls-r-1b for Finnish ASR
50
 
51
  This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) for Finnish ASR. The model has been fine-tuned with 275.6 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
52
  [this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
 
198
 
199
  ## Evaluation results
200
 
201
+ Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0), [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0) and with the [FLEURS ASR Finnish test split](https://huggingface.co/datasets/google/fleurs).
202
+
203
+ This model's training data includes the training splits of Common Voice 7.0 but our newer `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` and `Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish` models include the Common Voice 9.0 so we ran tests for both Common Voice versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, Common Voice test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
204
 
205
  ### Common Voice 7.0 testing
206
 
 
210
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
211
  ```
212
 
213
+ This model (the fift row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
214
 
215
+ | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
216
+ |-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
217
+ |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.85 |13.52 |1.35 |2.44 |
218
+ |Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |4.13 |**9.66** |0.90 |1.66 |
219
+ |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |8.16 |17.92 |1.97 |3.36 |
220
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.65 |13.11 |1.20 |2.23 |
221
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**4.09** |9.73 |**0.88** |**1.65** |
222
 
223
  ### Common Voice 9.0 testing
224
 
 
228
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
229
  ```
230
 
231
+ This model (the fift row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
232
+
233
+ | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
234
+ |-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
235
+ |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.93 |14.08 |1.40 |2.59 |
236
+ |Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |4.13 |9.83 |0.92 |1.71 |
237
+ |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |7.42 |16.45 |1.79 |3.07 |
238
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.35 |13.00 |1.14 |2.20 |
239
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**3.72** |**8.96** |**0.80** |**1.52** |
240
+
241
+ ### FLEURS ASR testing
242
+
243
+ To evaluate this model, run the `eval.py` script in this repository:
244
+
245
+ ```bash
246
+ python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset google/fleurs --config fi_fi --split test
247
+ ```
248
+
249
+ This model (the fift row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
250
 
251
+ | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
252
+ |-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
253
+ |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |13.99 |17.16 |6.07 |6.61 |
254
+ |Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |12.44 |**14.63** |5.77 |6.22 |
255
+ |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |17.72 |23.30 |6.78 |7.67 |
256
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |20.34 |16.67 |6.97 |6.35 |
257
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**12.11** |14.89 |**5.65** |**6.06** |
258
 
259
  ## Team Members
260