xls-asr-vi-40h / README.md
anton-l's picture
anton-l HF staff
Upload README.md
34f61a5
---
license: apache-2.0
language:
- vi
tags:
- automatic-speech-recognition
- common-voice
- hf-asr-leaderboard
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_7_0
model-index:
- name: xls-asr-vi-40h
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 7.0
type: mozilla-foundation/common_voice_7_0
args: vi
metrics:
- name: Test WER (with Language model)
type: wer
value: 56.57
---
# xls-asr-vi-40h
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common voice 7.0 vi & private dataset.
It achieves the following results on the evaluation set (Without Language Model):
- Loss: 1.1177
- Wer: 60.58
## Evaluation
Please run the eval.py file
```bash
!python eval_custom.py --model_id geninhu/xls-asr-vi-40h --dataset mozilla-foundation/common_voice_7_0 --config vi --split test
```
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1500
- num_epochs: 50.0
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:-----:|:---------------:|:------:|
| 23.3878 | 0.93 | 1500 | 21.9179 | 1.0 |
| 8.8862 | 1.85 | 3000 | 6.0599 | 1.0 |
| 4.3701 | 2.78 | 4500 | 4.3837 | 1.0 |
| 4.113 | 3.7 | 6000 | 4.2698 | 0.9982 |
| 3.9666 | 4.63 | 7500 | 3.9726 | 0.9989 |
| 3.5965 | 5.56 | 9000 | 3.7124 | 0.9975 |
| 3.3944 | 6.48 | 10500 | 3.5005 | 1.0057 |
| 3.304 | 7.41 | 12000 | 3.3710 | 1.0043 |
| 3.2482 | 8.33 | 13500 | 3.4201 | 1.0155 |
| 3.212 | 9.26 | 15000 | 3.3732 | 1.0151 |
| 3.1778 | 10.19 | 16500 | 3.2763 | 1.0009 |
| 3.1027 | 11.11 | 18000 | 3.1943 | 1.0025 |
| 2.9905 | 12.04 | 19500 | 2.8082 | 0.9703 |
| 2.7095 | 12.96 | 21000 | 2.4993 | 0.9302 |
| 2.4862 | 13.89 | 22500 | 2.3072 | 0.9140 |
| 2.3271 | 14.81 | 24000 | 2.1398 | 0.8949 |
| 2.1968 | 15.74 | 25500 | 2.0594 | 0.8817 |
| 2.111 | 16.67 | 27000 | 1.9404 | 0.8630 |
| 2.0387 | 17.59 | 28500 | 1.8895 | 0.8497 |
| 1.9504 | 18.52 | 30000 | 1.7961 | 0.8315 |
| 1.9039 | 19.44 | 31500 | 1.7433 | 0.8213 |
| 1.8342 | 20.37 | 33000 | 1.6790 | 0.7994 |
| 1.7824 | 21.3 | 34500 | 1.6291 | 0.7825 |
| 1.7359 | 22.22 | 36000 | 1.5783 | 0.7706 |
| 1.7053 | 23.15 | 37500 | 1.5248 | 0.7492 |
| 1.6504 | 24.07 | 39000 | 1.4930 | 0.7406 |
| 1.6263 | 25.0 | 40500 | 1.4572 | 0.7348 |
| 1.5893 | 25.93 | 42000 | 1.4202 | 0.7161 |
| 1.5669 | 26.85 | 43500 | 1.3987 | 0.7143 |
| 1.5277 | 27.78 | 45000 | 1.3512 | 0.6991 |
| 1.501 | 28.7 | 46500 | 1.3320 | 0.6879 |
| 1.4781 | 29.63 | 48000 | 1.3112 | 0.6788 |
| 1.4477 | 30.56 | 49500 | 1.2850 | 0.6657 |
| 1.4483 | 31.48 | 51000 | 1.2813 | 0.6633 |
| 1.4065 | 32.41 | 52500 | 1.2475 | 0.6541 |
| 1.3779 | 33.33 | 54000 | 1.2244 | 0.6503 |
| 1.3788 | 34.26 | 55500 | 1.2116 | 0.6407 |
| 1.3428 | 35.19 | 57000 | 1.1938 | 0.6352 |
| 1.3453 | 36.11 | 58500 | 1.1927 | 0.6340 |
| 1.3137 | 37.04 | 60000 | 1.1699 | 0.6252 |
| 1.2984 | 37.96 | 61500 | 1.1666 | 0.6229 |
| 1.2927 | 38.89 | 63000 | 1.1585 | 0.6188 |
| 1.2919 | 39.81 | 64500 | 1.1618 | 0.6190 |
| 1.293 | 40.74 | 66000 | 1.1479 | 0.6181 |
| 1.2853 | 41.67 | 67500 | 1.1423 | 0.6202 |
| 1.2687 | 42.59 | 69000 | 1.1315 | 0.6131 |
| 1.2603 | 43.52 | 70500 | 1.1333 | 0.6128 |
| 1.2577 | 44.44 | 72000 | 1.1191 | 0.6079 |
| 1.2435 | 45.37 | 73500 | 1.1177 | 0.6079 |
| 1.251 | 46.3 | 75000 | 1.1211 | 0.6092 |
| 1.2482 | 47.22 | 76500 | 1.1177 | 0.6060 |
| 1.2422 | 48.15 | 78000 | 1.1227 | 0.6097 |
| 1.2485 | 49.07 | 79500 | 1.1187 | 0.6071 |
| 1.2425 | 50.0 | 81000 | 1.1177 | 0.6058 |
### Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0