xls-asr-vi-40h / README.md

Upload README.md

34f61a5 about 2 years ago

No virus

4.92 kB

	---
	license: apache-2.0
	language:
	- vi
	tags:
	- automatic-speech-recognition
	- common-voice
	- hf-asr-leaderboard
	- robust-speech-event
	datasets:
	- mozilla-foundation/common_voice_7_0
	model-index:
	- name: xls-asr-vi-40h
	results:
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 7.0
	type: mozilla-foundation/common_voice_7_0
	args: vi
	metrics:
	- name: Test WER (with Language model)
	type: wer
	value: 56.57
	---

	# xls-asr-vi-40h

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common voice 7.0 vi & private dataset.
	It achieves the following results on the evaluation set (Without Language Model):
	- Loss: 1.1177
	- Wer: 60.58

	## Evaluation
	Please run the eval.py file

	```bash
	!python eval_custom.py --model_id geninhu/xls-asr-vi-40h --dataset mozilla-foundation/common_voice_7_0 --config vi --split test
	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1500
	- num_epochs: 50.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|
	\| 23.3878 \| 0.93 \| 1500 \| 21.9179 \| 1.0 \|
	\| 8.8862 \| 1.85 \| 3000 \| 6.0599 \| 1.0 \|
	\| 4.3701 \| 2.78 \| 4500 \| 4.3837 \| 1.0 \|
	\| 4.113 \| 3.7 \| 6000 \| 4.2698 \| 0.9982 \|
	\| 3.9666 \| 4.63 \| 7500 \| 3.9726 \| 0.9989 \|
	\| 3.5965 \| 5.56 \| 9000 \| 3.7124 \| 0.9975 \|
	\| 3.3944 \| 6.48 \| 10500 \| 3.5005 \| 1.0057 \|
	\| 3.304 \| 7.41 \| 12000 \| 3.3710 \| 1.0043 \|
	\| 3.2482 \| 8.33 \| 13500 \| 3.4201 \| 1.0155 \|
	\| 3.212 \| 9.26 \| 15000 \| 3.3732 \| 1.0151 \|
	\| 3.1778 \| 10.19 \| 16500 \| 3.2763 \| 1.0009 \|
	\| 3.1027 \| 11.11 \| 18000 \| 3.1943 \| 1.0025 \|
	\| 2.9905 \| 12.04 \| 19500 \| 2.8082 \| 0.9703 \|
	\| 2.7095 \| 12.96 \| 21000 \| 2.4993 \| 0.9302 \|
	\| 2.4862 \| 13.89 \| 22500 \| 2.3072 \| 0.9140 \|
	\| 2.3271 \| 14.81 \| 24000 \| 2.1398 \| 0.8949 \|
	\| 2.1968 \| 15.74 \| 25500 \| 2.0594 \| 0.8817 \|
	\| 2.111 \| 16.67 \| 27000 \| 1.9404 \| 0.8630 \|
	\| 2.0387 \| 17.59 \| 28500 \| 1.8895 \| 0.8497 \|
	\| 1.9504 \| 18.52 \| 30000 \| 1.7961 \| 0.8315 \|
	\| 1.9039 \| 19.44 \| 31500 \| 1.7433 \| 0.8213 \|
	\| 1.8342 \| 20.37 \| 33000 \| 1.6790 \| 0.7994 \|
	\| 1.7824 \| 21.3 \| 34500 \| 1.6291 \| 0.7825 \|
	\| 1.7359 \| 22.22 \| 36000 \| 1.5783 \| 0.7706 \|
	\| 1.7053 \| 23.15 \| 37500 \| 1.5248 \| 0.7492 \|
	\| 1.6504 \| 24.07 \| 39000 \| 1.4930 \| 0.7406 \|
	\| 1.6263 \| 25.0 \| 40500 \| 1.4572 \| 0.7348 \|
	\| 1.5893 \| 25.93 \| 42000 \| 1.4202 \| 0.7161 \|
	\| 1.5669 \| 26.85 \| 43500 \| 1.3987 \| 0.7143 \|
	\| 1.5277 \| 27.78 \| 45000 \| 1.3512 \| 0.6991 \|
	\| 1.501 \| 28.7 \| 46500 \| 1.3320 \| 0.6879 \|
	\| 1.4781 \| 29.63 \| 48000 \| 1.3112 \| 0.6788 \|
	\| 1.4477 \| 30.56 \| 49500 \| 1.2850 \| 0.6657 \|
	\| 1.4483 \| 31.48 \| 51000 \| 1.2813 \| 0.6633 \|
	\| 1.4065 \| 32.41 \| 52500 \| 1.2475 \| 0.6541 \|
	\| 1.3779 \| 33.33 \| 54000 \| 1.2244 \| 0.6503 \|
	\| 1.3788 \| 34.26 \| 55500 \| 1.2116 \| 0.6407 \|
	\| 1.3428 \| 35.19 \| 57000 \| 1.1938 \| 0.6352 \|
	\| 1.3453 \| 36.11 \| 58500 \| 1.1927 \| 0.6340 \|
	\| 1.3137 \| 37.04 \| 60000 \| 1.1699 \| 0.6252 \|
	\| 1.2984 \| 37.96 \| 61500 \| 1.1666 \| 0.6229 \|
	\| 1.2927 \| 38.89 \| 63000 \| 1.1585 \| 0.6188 \|
	\| 1.2919 \| 39.81 \| 64500 \| 1.1618 \| 0.6190 \|
	\| 1.293 \| 40.74 \| 66000 \| 1.1479 \| 0.6181 \|
	\| 1.2853 \| 41.67 \| 67500 \| 1.1423 \| 0.6202 \|
	\| 1.2687 \| 42.59 \| 69000 \| 1.1315 \| 0.6131 \|
	\| 1.2603 \| 43.52 \| 70500 \| 1.1333 \| 0.6128 \|
	\| 1.2577 \| 44.44 \| 72000 \| 1.1191 \| 0.6079 \|
	\| 1.2435 \| 45.37 \| 73500 \| 1.1177 \| 0.6079 \|
	\| 1.251 \| 46.3 \| 75000 \| 1.1211 \| 0.6092 \|
	\| 1.2482 \| 47.22 \| 76500 \| 1.1177 \| 0.6060 \|
	\| 1.2422 \| 48.15 \| 78000 \| 1.1227 \| 0.6097 \|
	\| 1.2485 \| 49.07 \| 79500 \| 1.1187 \| 0.6071 \|
	\| 1.2425 \| 50.0 \| 81000 \| 1.1177 \| 0.6058 \|


	### Framework versions

	- Transformers 4.16.0.dev0
	- Pytorch 1.10.1+cu102
	- Datasets 1.17.1.dev0
	- Tokenizers 0.11.0

	---
	license: apache-2.0
	language:
	- vi
	tags:
	- automatic-speech-recognition
	- common-voice
	- hf-asr-leaderboard
	- robust-speech-event
	datasets:
	- mozilla-foundation/common_voice_7_0
	model-index:
	- name: xls-asr-vi-40h
	results:
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 7.0
	type: mozilla-foundation/common_voice_7_0
	args: vi
	metrics:
	- name: Test WER (with Language model)
	type: wer
	value: 56.57
	---

	# xls-asr-vi-40h

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common voice 7.0 vi & private dataset.
	It achieves the following results on the evaluation set (Without Language Model):
	- Loss: 1.1177
	- Wer: 60.58

	## Evaluation
	Please run the eval.py file

	```bash
	!python eval_custom.py --model_id geninhu/xls-asr-vi-40h --dataset mozilla-foundation/common_voice_7_0 --config vi --split test
	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1500
	- num_epochs: 50.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|
	\| 23.3878 \| 0.93 \| 1500 \| 21.9179 \| 1.0 \|
	\| 8.8862 \| 1.85 \| 3000 \| 6.0599 \| 1.0 \|
	\| 4.3701 \| 2.78 \| 4500 \| 4.3837 \| 1.0 \|
	\| 4.113 \| 3.7 \| 6000 \| 4.2698 \| 0.9982 \|
	\| 3.9666 \| 4.63 \| 7500 \| 3.9726 \| 0.9989 \|
	\| 3.5965 \| 5.56 \| 9000 \| 3.7124 \| 0.9975 \|
	\| 3.3944 \| 6.48 \| 10500 \| 3.5005 \| 1.0057 \|
	\| 3.304 \| 7.41 \| 12000 \| 3.3710 \| 1.0043 \|
	\| 3.2482 \| 8.33 \| 13500 \| 3.4201 \| 1.0155 \|
	\| 3.212 \| 9.26 \| 15000 \| 3.3732 \| 1.0151 \|
	\| 3.1778 \| 10.19 \| 16500 \| 3.2763 \| 1.0009 \|
	\| 3.1027 \| 11.11 \| 18000 \| 3.1943 \| 1.0025 \|
	\| 2.9905 \| 12.04 \| 19500 \| 2.8082 \| 0.9703 \|
	\| 2.7095 \| 12.96 \| 21000 \| 2.4993 \| 0.9302 \|
	\| 2.4862 \| 13.89 \| 22500 \| 2.3072 \| 0.9140 \|
	\| 2.3271 \| 14.81 \| 24000 \| 2.1398 \| 0.8949 \|
	\| 2.1968 \| 15.74 \| 25500 \| 2.0594 \| 0.8817 \|
	\| 2.111 \| 16.67 \| 27000 \| 1.9404 \| 0.8630 \|
	\| 2.0387 \| 17.59 \| 28500 \| 1.8895 \| 0.8497 \|
	\| 1.9504 \| 18.52 \| 30000 \| 1.7961 \| 0.8315 \|
	\| 1.9039 \| 19.44 \| 31500 \| 1.7433 \| 0.8213 \|
	\| 1.8342 \| 20.37 \| 33000 \| 1.6790 \| 0.7994 \|
	\| 1.7824 \| 21.3 \| 34500 \| 1.6291 \| 0.7825 \|
	\| 1.7359 \| 22.22 \| 36000 \| 1.5783 \| 0.7706 \|
	\| 1.7053 \| 23.15 \| 37500 \| 1.5248 \| 0.7492 \|
	\| 1.6504 \| 24.07 \| 39000 \| 1.4930 \| 0.7406 \|
	\| 1.6263 \| 25.0 \| 40500 \| 1.4572 \| 0.7348 \|
	\| 1.5893 \| 25.93 \| 42000 \| 1.4202 \| 0.7161 \|
	\| 1.5669 \| 26.85 \| 43500 \| 1.3987 \| 0.7143 \|
	\| 1.5277 \| 27.78 \| 45000 \| 1.3512 \| 0.6991 \|
	\| 1.501 \| 28.7 \| 46500 \| 1.3320 \| 0.6879 \|
	\| 1.4781 \| 29.63 \| 48000 \| 1.3112 \| 0.6788 \|
	\| 1.4477 \| 30.56 \| 49500 \| 1.2850 \| 0.6657 \|
	\| 1.4483 \| 31.48 \| 51000 \| 1.2813 \| 0.6633 \|
	\| 1.4065 \| 32.41 \| 52500 \| 1.2475 \| 0.6541 \|
	\| 1.3779 \| 33.33 \| 54000 \| 1.2244 \| 0.6503 \|
	\| 1.3788 \| 34.26 \| 55500 \| 1.2116 \| 0.6407 \|
	\| 1.3428 \| 35.19 \| 57000 \| 1.1938 \| 0.6352 \|
	\| 1.3453 \| 36.11 \| 58500 \| 1.1927 \| 0.6340 \|
	\| 1.3137 \| 37.04 \| 60000 \| 1.1699 \| 0.6252 \|
	\| 1.2984 \| 37.96 \| 61500 \| 1.1666 \| 0.6229 \|
	\| 1.2927 \| 38.89 \| 63000 \| 1.1585 \| 0.6188 \|
	\| 1.2919 \| 39.81 \| 64500 \| 1.1618 \| 0.6190 \|
	\| 1.293 \| 40.74 \| 66000 \| 1.1479 \| 0.6181 \|
	\| 1.2853 \| 41.67 \| 67500 \| 1.1423 \| 0.6202 \|
	\| 1.2687 \| 42.59 \| 69000 \| 1.1315 \| 0.6131 \|
	\| 1.2603 \| 43.52 \| 70500 \| 1.1333 \| 0.6128 \|
	\| 1.2577 \| 44.44 \| 72000 \| 1.1191 \| 0.6079 \|
	\| 1.2435 \| 45.37 \| 73500 \| 1.1177 \| 0.6079 \|
	\| 1.251 \| 46.3 \| 75000 \| 1.1211 \| 0.6092 \|
	\| 1.2482 \| 47.22 \| 76500 \| 1.1177 \| 0.6060 \|
	\| 1.2422 \| 48.15 \| 78000 \| 1.1227 \| 0.6097 \|
	\| 1.2485 \| 49.07 \| 79500 \| 1.1187 \| 0.6071 \|
	\| 1.2425 \| 50.0 \| 81000 \| 1.1177 \| 0.6058 \|


	### Framework versions

	- Transformers 4.16.0.dev0
	- Pytorch 1.10.1+cu102
	- Datasets 1.17.1.dev0
	- Tokenizers 0.11.0