Finnish-NLP
/

wav2vec2-xlsr-300m-finnish-lm

@@ -30,9 +30,23 @@ model-index:
     - name: Test CER
       type: cer
       value: 1.97
 ---
-# Wav2Vec2 XLS-R for Finnish ASR
 This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) for Finnish ASR. The model has been fine-tuned with 275.6 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
 [this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
@@ -184,7 +198,9 @@ The pretrained `facebook/wav2vec2-xls-r-300m` model was initialized with followi
 ## Evaluation results
-Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) and with the [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0). This model's training data includes the training splits of Common Voice 7.0 but our newest `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` model includes the Common Voice 9.0 so we ran tests for both versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
 ### Common Voice 7.0 testing
@@ -194,14 +210,15 @@ To evaluate this model, run the `eval.py` script in this repository:
 python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm  --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
 ```
-This model (the fourth row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
-|                                                    | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
-|----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2          | 1000 million     |**4.09**       |**9.73**          |**0.88**       |**1.65**          |
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm             | 1000 million     |5.65           |13.11             |1.20           |2.23              |
-|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million       |5.85           |13.52             |1.35           |2.44              |
-|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm           | 300 million      |8.16           |17.92             |1.97           |3.36              |
 ### Common Voice 9.0 testing
@@ -211,14 +228,33 @@ To evaluate this model, run the `eval.py` script in this repository:
 python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm  --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
 ```
-This model (the fourth row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
-|                                                    | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
-|----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2          | 1000 million     |**3.72**       |**8.96**          |**0.80**       |**1.52**          |
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm             | 1000 million     |5.35           |13.00             |1.14           |2.20              |
-|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million       |5.93           |14.08             |1.40           |2.59              |
-|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm           | 300 million      |7.42           |16.45             |1.79           |3.07              |
 ## Team Members

     - name: Test CER
       type: cer
       value: 1.97
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: FLEURS ASR
+      type: google/fleurs
+      args: fi_fi
+    metrics:
+    - name: Test WER
+      type: wer
+      value: 17.72
+    - name: Test CER
+      type: cer
+      value: 6.78
 ---
+# Wav2vec2-xls-r-300m for Finnish ASR
 This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) for Finnish ASR. The model has been fine-tuned with 275.6 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
 [this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
 ## Evaluation results
+Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0), [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0) and with the [FLEURS ASR Finnish test split](https://huggingface.co/datasets/google/fleurs).
+This model's training data includes the training splits of Common Voice 7.0 but our newer `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` and `Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish` models include the Common Voice 9.0 so we ran tests for both Common Voice versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, Common Voice test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
 ### Common Voice 7.0 testing
 python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm  --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
 ```
+This model (the third row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
+|                                                       | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
+|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
+|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned    | 95 million       |5.85           |13.52             |1.35           |2.44              |
+|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million      |4.13           |**9.66**          |0.90           |1.66              |
+|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm              | 300 million      |8.16           |17.92             |1.97           |3.36              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm                | 1000 million     |5.65           |13.11             |1.20           |2.23              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2             | 1000 million     |**4.09**       |9.73              |**0.88**       |**1.65**          |
 ### Common Voice 9.0 testing
 python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm  --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
 ```
+This model (the third row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
+|                                                       | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
+|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
+|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned    | 95 million       |5.93           |14.08             |1.40           |2.59              |
+|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million      |4.13           |9.83              |0.92           |1.71              |
+|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm              | 300 million      |7.42           |16.45             |1.79           |3.07              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm                | 1000 million     |5.35           |13.00             |1.14           |2.20              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2             | 1000 million     |**3.72**       |**8.96**          |**0.80**       |**1.52**          |
+### FLEURS ASR testing
+To evaluate this model, run the `eval.py` script in this repository:
+```bash
+python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm --dataset google/fleurs --config fi_fi --split test
+```
+This model (the third row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
+|                                                       | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
+|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
+|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned    | 95 million       |13.99          |17.16             |6.07           |6.61              |
+|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million      |12.44          |**14.63**         |5.77           |6.22              |
+|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm              | 300 million      |17.72          |23.30             |6.78           |7.67              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm                | 1000 million     |20.34          |16.67             |6.97           |6.35              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2             | 1000 million     |**12.11**      |14.89             |**5.65**       |**6.06**          |
 ## Team Members