comodoro commited on
Commit
439702c
1 Parent(s): ca1377c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -74
README.md CHANGED
@@ -10,8 +10,11 @@ tags:
10
  - xlsr-fine-tuning-week
11
  datasets:
12
  - common_voice
 
 
 
13
  model-index:
14
- - name: Czech comodoro Wav2Vec2 XLSR 300M CV8
15
  results:
16
  - task:
17
  name: Automatic Speech Recognition
@@ -23,25 +26,29 @@ model-index:
23
  metrics:
24
  - name: Test WER
25
  type: wer
26
- value: 10.3
27
  - name: Test CER
28
  type: cer
29
  value: 2.6
30
  ---
31
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
- should probably proofread and complete it, then remove this comment. -->
33
 
34
- # wav2vec2-xls-r-300m-cs-cv8
35
 
36
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8.0 dataset.
37
- It achieves the following results on the evaluation set while training:
38
- - Loss: 0.2327
39
- - Wer: 0.1608
40
- - Cer: 0.0376
 
 
 
 
 
 
41
 
42
  The `eval.py` script results using a LM are:
43
- WER: 0.10281503199350225
44
- CER: 0.02622802241689026
45
 
46
  ## Model description
47
 
@@ -59,8 +66,8 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
59
 
60
  test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "cs", split="test[:2%]")
61
 
62
- processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-cv8")
63
- model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-cv8")
64
 
65
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
66
 
@@ -87,83 +94,35 @@ print("Reference:", test_dataset[:2]["sentence"])
87
 
88
  The model can be evaluated using the attached `eval.py` script:
89
  ```
90
- python eval.py --model_id comodoro/wav2vec2-xls-r-300m-cs-cv8 --dataset mozilla-foundation/common-voice_8_0 --split test --config cs
91
  ```
92
 
93
  ## Training and evaluation data
94
 
95
- The Common Voice 8.0 `train` and `validation` datasets were used for training
96
 
97
- ## Training procedure
98
 
99
- ### Training hyperparameters
100
 
101
- The following hyperparameters were used during first stage of training:
102
 
103
- - learning_rate: 7e-05
104
- - train_batch_size: 32
105
- - eval_batch_size: 8
106
- - seed: 42
107
- - gradient_accumulation_steps: 20
108
- - total_train_batch_size: 640
109
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
110
- - lr_scheduler_type: linear
111
- - lr_scheduler_warmup_steps: 500
112
- - num_epochs: 150
113
- - mixed_precision_training: Native AMP
114
-
115
- The following hyperparameters were used during second stage of training:
116
 
117
- - learning_rate: 0.001
118
- - train_batch_size: 32
 
119
  - eval_batch_size: 8
120
  - seed: 42
121
- - gradient_accumulation_steps: 20
122
- - total_train_batch_size: 640
123
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
124
  - lr_scheduler_type: linear
125
- - lr_scheduler_warmup_steps: 500
126
  - num_epochs: 50
127
  - mixed_precision_training: Native AMP
128
 
129
- ### Training results
130
-
131
- | Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
132
- |:-------------:|:------:|:----:|:---------------:|:------:|:------:|
133
- | 7.2926 | 8.06 | 250 | 3.8497 | 1.0 | 1.0 |
134
- | 3.417 | 16.13 | 500 | 3.2852 | 1.0 | 0.9857 |
135
- | 2.0264 | 24.19 | 750 | 0.7099 | 0.7342 | 0.1768 |
136
- | 0.4018 | 32.25 | 1000 | 0.6188 | 0.6415 | 0.1551 |
137
- | 0.2444 | 40.32 | 1250 | 0.6632 | 0.6362 | 0.1600 |
138
- | 0.1882 | 48.38 | 1500 | 0.6070 | 0.5783 | 0.1388 |
139
- | 0.153 | 56.44 | 1750 | 0.6425 | 0.5720 | 0.1377 |
140
- | 0.1214 | 64.51 | 2000 | 0.6363 | 0.5546 | 0.1337 |
141
- | 0.1011 | 72.57 | 2250 | 0.6310 | 0.5222 | 0.1224 |
142
- | 0.0879 | 80.63 | 2500 | 0.6353 | 0.5258 | 0.1253 |
143
- | 0.0782 | 88.7 | 2750 | 0.6078 | 0.4904 | 0.1127 |
144
- | 0.0709 | 96.76 | 3000 | 0.6465 | 0.4960 | 0.1154 |
145
- | 0.0661 | 104.82 | 3250 | 0.6622 | 0.4945 | 0.1166 |
146
- | 0.0616 | 112.89 | 3500 | 0.6440 | 0.4786 | 0.1104 |
147
- | 0.0579 | 120.95 | 3750 | 0.6815 | 0.4887 | 0.1144 |
148
- | 0.0549 | 129.03 | 4000 | 0.6603 | 0.4780 | 0.1105 |
149
- | 0.0527 | 137.09 | 4250 | 0.6652 | 0.4749 | 0.1090 |
150
- | 0.0506 | 145.16 | 4500 | 0.6958 | 0.4846 | 0.1133 |
151
-
152
- Further fine-tuning with slightly different architecture and higher learning rate:
153
-
154
- | Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
155
- |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|
156
- | 0.576 | 8.06 | 250 | 0.2411 | 0.2340 | 0.0502 |
157
- | 0.2564 | 16.13 | 500 | 0.2305 | 0.2097 | 0.0492 |
158
- | 0.2018 | 24.19 | 750 | 0.2371 | 0.2059 | 0.0494 |
159
- | 0.1549 | 32.25 | 1000 | 0.2298 | 0.1844 | 0.0435 |
160
- | 0.1224 | 40.32 | 1250 | 0.2288 | 0.1725 | 0.0407 |
161
- | 0.1004 | 48.38 | 1500 | 0.2327 | 0.1608 | 0.0376 |
162
-
163
-
164
  ### Framework versions
165
 
166
- - Transformers 4.16.0.dev0
167
  - Pytorch 1.10.1+cu102
168
- - Datasets 1.17.1.dev0
169
  - Tokenizers 0.11.0
 
10
  - xlsr-fine-tuning-week
11
  datasets:
12
  - common_voice
13
+ - ovm
14
+ - pscr
15
+ - vystadial2016
16
  model-index:
17
+ - name: Czech comodoro Wav2Vec2 XLSR 300M 250h data
18
  results:
19
  - task:
20
  name: Automatic Speech Recognition
 
26
  metrics:
27
  - name: Test WER
28
  type: wer
29
+ value: 10.0
30
  - name: Test CER
31
  type: cer
32
  value: 2.6
33
  ---
 
 
34
 
35
+ # Czech wav2vec2-xls-r-300m-cs-250
36
 
37
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8.0 dataset as well as other datasets listed below.
38
+
39
+ It achieves the following results on the evaluation set:
40
+ - eval_loss: 0.1304
41
+ - eval_wer: 0.1517
42
+ - eval_cer: 0.0326
43
+ - eval_runtime: 358.9895
44
+ - eval_samples_per_second: 20.243
45
+ - eval_steps_per_second: 2.532
46
+ - epoch: 3.13
47
+ - step: 31200
48
 
49
  The `eval.py` script results using a LM are:
50
+ WER: 0.10053685691079459
51
+ CER: 0.025859623842234124
52
 
53
  ## Model description
54
 
 
66
 
67
  test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "cs", split="test[:2%]")
68
 
69
+ processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-250")
70
+ model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-250")
71
 
72
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
73
 
 
94
 
95
  The model can be evaluated using the attached `eval.py` script:
96
  ```
97
+ python eval.py --model_id comodoro/wav2vec2-xls-r-300m-cs-250 --dataset mozilla-foundation/common-voice_8_0 --split test --config cs
98
  ```
99
 
100
  ## Training and evaluation data
101
 
102
+ The Common Voice 8.0 `train` and `validation` datasets were used for training, as well as the following datasets:
103
 
104
+ - Šmídl, Luboš and Pražák, Aleš, 2013, OVM – Otázky Václava Moravce, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11858/00-097C-0000-000D-EC98-3.
105
 
106
+ - Pražák, Aleš and Šmídl, Luboš, 2012, Czech Parliament Meetings, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4.
107
 
108
+ - Plátek, Ondřej; Dušek, Ondřej and Jurčíček, Filip, 2016, Vystadial 2016 – Czech data, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-1740.
109
 
110
+ ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
+ The following hyperparameters were used during training:
113
+ - learning_rate: 1e-05
114
+ - train_batch_size: 16
115
  - eval_batch_size: 8
116
  - seed: 42
 
 
117
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
118
  - lr_scheduler_type: linear
119
+ - lr_scheduler_warmup_steps: 600
120
  - num_epochs: 50
121
  - mixed_precision_training: Native AMP
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ### Framework versions
124
 
125
+ - Transformers 4.16.2
126
  - Pytorch 1.10.1+cu102
127
+ - Datasets 1.18.3
128
  - Tokenizers 0.11.0