--- license: apache-2.0 language: - vi tags: - automatic-speech-recognition - common-voice - hf-asr-leaderboard - robust-speech-event datasets: - mozilla-foundation/common_voice_7_0 model-index: - name: xls-asr-vi-40h results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 7.0 type: mozilla-foundation/common_voice_7_0 args: vi metrics: - name: Test WER (with Language model) type: wer value: 56.57 --- # xls-asr-vi-40h This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common voice 7.0 vi & private dataset. It achieves the following results on the evaluation set (Without Language Model): - Loss: 1.1177 - Wer: 60.58 ## Evaluation Please run the eval.py file ```bash !python eval_custom.py --model_id geninhu/xls-asr-vi-40h --dataset mozilla-foundation/common_voice_7_0 --config vi --split test ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1500 - num_epochs: 50.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:-----:|:---------------:|:------:| | 23.3878 | 0.93 | 1500 | 21.9179 | 1.0 | | 8.8862 | 1.85 | 3000 | 6.0599 | 1.0 | | 4.3701 | 2.78 | 4500 | 4.3837 | 1.0 | | 4.113 | 3.7 | 6000 | 4.2698 | 0.9982 | | 3.9666 | 4.63 | 7500 | 3.9726 | 0.9989 | | 3.5965 | 5.56 | 9000 | 3.7124 | 0.9975 | | 3.3944 | 6.48 | 10500 | 3.5005 | 1.0057 | | 3.304 | 7.41 | 12000 | 3.3710 | 1.0043 | | 3.2482 | 8.33 | 13500 | 3.4201 | 1.0155 | | 3.212 | 9.26 | 15000 | 3.3732 | 1.0151 | | 3.1778 | 10.19 | 16500 | 3.2763 | 1.0009 | | 3.1027 | 11.11 | 18000 | 3.1943 | 1.0025 | | 2.9905 | 12.04 | 19500 | 2.8082 | 0.9703 | | 2.7095 | 12.96 | 21000 | 2.4993 | 0.9302 | | 2.4862 | 13.89 | 22500 | 2.3072 | 0.9140 | | 2.3271 | 14.81 | 24000 | 2.1398 | 0.8949 | | 2.1968 | 15.74 | 25500 | 2.0594 | 0.8817 | | 2.111 | 16.67 | 27000 | 1.9404 | 0.8630 | | 2.0387 | 17.59 | 28500 | 1.8895 | 0.8497 | | 1.9504 | 18.52 | 30000 | 1.7961 | 0.8315 | | 1.9039 | 19.44 | 31500 | 1.7433 | 0.8213 | | 1.8342 | 20.37 | 33000 | 1.6790 | 0.7994 | | 1.7824 | 21.3 | 34500 | 1.6291 | 0.7825 | | 1.7359 | 22.22 | 36000 | 1.5783 | 0.7706 | | 1.7053 | 23.15 | 37500 | 1.5248 | 0.7492 | | 1.6504 | 24.07 | 39000 | 1.4930 | 0.7406 | | 1.6263 | 25.0 | 40500 | 1.4572 | 0.7348 | | 1.5893 | 25.93 | 42000 | 1.4202 | 0.7161 | | 1.5669 | 26.85 | 43500 | 1.3987 | 0.7143 | | 1.5277 | 27.78 | 45000 | 1.3512 | 0.6991 | | 1.501 | 28.7 | 46500 | 1.3320 | 0.6879 | | 1.4781 | 29.63 | 48000 | 1.3112 | 0.6788 | | 1.4477 | 30.56 | 49500 | 1.2850 | 0.6657 | | 1.4483 | 31.48 | 51000 | 1.2813 | 0.6633 | | 1.4065 | 32.41 | 52500 | 1.2475 | 0.6541 | | 1.3779 | 33.33 | 54000 | 1.2244 | 0.6503 | | 1.3788 | 34.26 | 55500 | 1.2116 | 0.6407 | | 1.3428 | 35.19 | 57000 | 1.1938 | 0.6352 | | 1.3453 | 36.11 | 58500 | 1.1927 | 0.6340 | | 1.3137 | 37.04 | 60000 | 1.1699 | 0.6252 | | 1.2984 | 37.96 | 61500 | 1.1666 | 0.6229 | | 1.2927 | 38.89 | 63000 | 1.1585 | 0.6188 | | 1.2919 | 39.81 | 64500 | 1.1618 | 0.6190 | | 1.293 | 40.74 | 66000 | 1.1479 | 0.6181 | | 1.2853 | 41.67 | 67500 | 1.1423 | 0.6202 | | 1.2687 | 42.59 | 69000 | 1.1315 | 0.6131 | | 1.2603 | 43.52 | 70500 | 1.1333 | 0.6128 | | 1.2577 | 44.44 | 72000 | 1.1191 | 0.6079 | | 1.2435 | 45.37 | 73500 | 1.1177 | 0.6079 | | 1.251 | 46.3 | 75000 | 1.1211 | 0.6092 | | 1.2482 | 47.22 | 76500 | 1.1177 | 0.6060 | | 1.2422 | 48.15 | 78000 | 1.1227 | 0.6097 | | 1.2485 | 49.07 | 79500 | 1.1187 | 0.6071 | | 1.2425 | 50.0 | 81000 | 1.1177 | 0.6058 | ### Framework versions - Transformers 4.16.0.dev0 - Pytorch 1.10.1+cu102 - Datasets 1.17.1.dev0 - Tokenizers 0.11.0