Update README.md
Browse files
README.md
CHANGED
@@ -30,9 +30,23 @@ model-index:
|
|
30 |
- name: Test CER
|
31 |
type: cer
|
32 |
value: 0.88
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
---
|
34 |
|
35 |
-
#
|
36 |
|
37 |
This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) for Finnish ASR. The model has been fine-tuned with 275.6 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
|
38 |
[this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
|
@@ -184,7 +198,9 @@ The pretrained `facebook/wav2vec2-xls-r-1b` model was initialized with following
|
|
184 |
|
185 |
## Evaluation results
|
186 |
|
187 |
-
Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0)
|
|
|
|
|
188 |
|
189 |
### Common Voice 7.0 testing
|
190 |
|
@@ -194,14 +210,15 @@ To evaluate this model, run the `eval.py` script in this repository:
|
|
194 |
python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
|
195 |
```
|
196 |
|
197 |
-
This model (the
|
198 |
|
199 |
-
|
|
200 |
-
|
201 |
-
|Finnish-NLP/wav2vec2-
|
202 |
-
|Finnish-NLP/wav2vec2-
|
203 |
-
|Finnish-NLP/wav2vec2-
|
204 |
-
|Finnish-NLP/wav2vec2-xlsr-
|
|
|
205 |
|
206 |
### Common Voice 9.0 testing
|
207 |
|
@@ -211,14 +228,33 @@ To evaluate this model, run the `eval.py` script in this repository:
|
|
211 |
python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
|
212 |
```
|
213 |
|
214 |
-
This model (the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
215 |
|
216 |
-
|
|
217 |
-
|
218 |
-
|Finnish-NLP/wav2vec2-
|
219 |
-
|Finnish-NLP/wav2vec2-
|
220 |
-
|Finnish-NLP/wav2vec2-
|
221 |
-
|Finnish-NLP/wav2vec2-xlsr-
|
|
|
222 |
|
223 |
## Team Members
|
224 |
|
|
|
30 |
- name: Test CER
|
31 |
type: cer
|
32 |
value: 0.88
|
33 |
+
- task:
|
34 |
+
name: Automatic Speech Recognition
|
35 |
+
type: automatic-speech-recognition
|
36 |
+
dataset:
|
37 |
+
name: FLEURS ASR
|
38 |
+
type: google/fleurs
|
39 |
+
args: fi_fi
|
40 |
+
metrics:
|
41 |
+
- name: Test WER
|
42 |
+
type: wer
|
43 |
+
value: 12.11
|
44 |
+
- name: Test CER
|
45 |
+
type: cer
|
46 |
+
value: 5.65
|
47 |
---
|
48 |
|
49 |
+
# Wav2vec2-xls-r-1b for Finnish ASR
|
50 |
|
51 |
This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) for Finnish ASR. The model has been fine-tuned with 275.6 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
|
52 |
[this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
|
|
|
198 |
|
199 |
## Evaluation results
|
200 |
|
201 |
+
Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0), [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0) and with the [FLEURS ASR Finnish test split](https://huggingface.co/datasets/google/fleurs).
|
202 |
+
|
203 |
+
This model's training data includes the training splits of Common Voice 7.0 but our newer `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` and `Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish` models include the Common Voice 9.0 so we ran tests for both Common Voice versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, Common Voice test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
|
204 |
|
205 |
### Common Voice 7.0 testing
|
206 |
|
|
|
210 |
python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
|
211 |
```
|
212 |
|
213 |
+
This model (the fift row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
|
214 |
|
215 |
+
| | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
|
216 |
+
|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
|
217 |
+
|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.85 |13.52 |1.35 |2.44 |
|
218 |
+
|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |4.13 |**9.66** |0.90 |1.66 |
|
219 |
+
|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |8.16 |17.92 |1.97 |3.36 |
|
220 |
+
|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.65 |13.11 |1.20 |2.23 |
|
221 |
+
|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**4.09** |9.73 |**0.88** |**1.65** |
|
222 |
|
223 |
### Common Voice 9.0 testing
|
224 |
|
|
|
228 |
python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
|
229 |
```
|
230 |
|
231 |
+
This model (the fift row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
|
232 |
+
|
233 |
+
| | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
|
234 |
+
|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
|
235 |
+
|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.93 |14.08 |1.40 |2.59 |
|
236 |
+
|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |4.13 |9.83 |0.92 |1.71 |
|
237 |
+
|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |7.42 |16.45 |1.79 |3.07 |
|
238 |
+
|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.35 |13.00 |1.14 |2.20 |
|
239 |
+
|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**3.72** |**8.96** |**0.80** |**1.52** |
|
240 |
+
|
241 |
+
### FLEURS ASR testing
|
242 |
+
|
243 |
+
To evaluate this model, run the `eval.py` script in this repository:
|
244 |
+
|
245 |
+
```bash
|
246 |
+
python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset google/fleurs --config fi_fi --split test
|
247 |
+
```
|
248 |
+
|
249 |
+
This model (the fift row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
|
250 |
|
251 |
+
| | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
|
252 |
+
|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
|
253 |
+
|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |13.99 |17.16 |6.07 |6.61 |
|
254 |
+
|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |12.44 |**14.63** |5.77 |6.22 |
|
255 |
+
|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |17.72 |23.30 |6.78 |7.67 |
|
256 |
+
|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |20.34 |16.67 |6.97 |6.35 |
|
257 |
+
|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**12.11** |14.89 |**5.65** |**6.06** |
|
258 |
|
259 |
## Team Members
|
260 |
|