AndrewMcDowell commited on
Commit
504d95e
1 Parent(s): bf0f232

Update README.md

Browse files

Add eval results.

Files changed (1) hide show
  1. README.md +27 -2
README.md CHANGED
@@ -9,8 +9,23 @@ tags:
9
  datasets:
10
  - common_voice
11
  model-index:
12
- - name: ''
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -19,6 +34,9 @@ should probably proofread and complete it, then remove this comment. -->
19
  #
20
 
21
  This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - JA dataset.
 
 
 
22
  It achieves the following results on the evaluation set:
23
  - Loss: 0.5212
24
  - Wer: 1.3068
@@ -72,3 +90,10 @@ The following hyperparameters were used during training:
72
  - Pytorch 1.10.2+cu102
73
  - Datasets 1.18.2.dev0
74
  - Tokenizers 0.11.0
 
 
 
 
 
 
 
 
9
  datasets:
10
  - common_voice
11
  model-index:
12
+ - name: 'XLS-R-300-m'
13
+ results:
14
+ - task:
15
+ name: Automatic Speech Recognition
16
+ type: automatic-speech-recognition
17
+ dataset:
18
+ name: Common Voice 8
19
+ type: mozilla-foundation/common_voice_8_0
20
+ args: ja
21
+ metrics:
22
+ - name: Test WER
23
+ type: wer
24
+ value: 95.82
25
+ - name: Test CER
26
+ type: cer
27
+ value: 23.64
28
+
29
  ---
30
 
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
34
  #
35
 
36
  This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - JA dataset.
37
+
38
+ Kanji are converted into Hiragana using the [pykakasi](https://pykakasi.readthedocs.io/en/latest/index.html) library during training and evaluation. The model can output both Hiragana and Katakana characters.
39
+
40
  It achieves the following results on the evaluation set:
41
  - Loss: 0.5212
42
  - Wer: 1.3068
 
90
  - Pytorch 1.10.2+cu102
91
  - Datasets 1.18.2.dev0
92
  - Tokenizers 0.11.0
93
+
94
+ #### Evaluation Commands
95
+ 1. To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`
96
+
97
+ ```bash
98
+ python ./eval.py --model_id AndrewMcDowell/wav2vec2-xls-r-300m-japanese --dataset mozilla-foundation/common_voice_8_0 --config ja --split test --log_outputs
99
+ ```