Update README.md

cd63e86 over 1 year ago

4.32 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	datasets:
	- common_voice_8_0
	metrics:
	- wer
	model-index:
	- name: wav2vec2-large-xls-r-1b-frisian-cv-8
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: common_voice_8_0
	type: common_voice_8_0
	config: fy-NL
	split: validation
	args: fy-NL
	metrics:
	- name: Wer
	type: wer
	value: 0.14290815597771747
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: common_voice_8_0
	type: common_voice_8_0
	config: fy-NL
	split: test
	args: fy-NL
	metrics:
	- name: Wer
	type: wer
	value: 0.1413499060557884
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# wav2vec2-large-xls-r-1b-frisian-cv-8

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the common_voice_8_0 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2131
	- Wer: 0.1429

	And on the test set:
	- Wer: 0.1413

	## Model description

	This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 1 where
	I use the same training set as the XLSR-53 baseline.

	## Intended uses & limitations

	The intended use is for recognizing Frisian speech.

	Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

	## Training and evaluation data

	The training and evaluation splits used are the ones available in the Common Voice 8.0 Frisian subset.

	## Training procedure

	The script used for training this model can be found in this GitHub repository: [link](https://github.com/greenw0lf/MSc-VT-Thesis/).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 6.0565 \| 1.72 \| 200 \| 3.1053 \| 1.0 \|
	\| 2.7675 \| 3.45 \| 400 \| 1.1551 \| 0.8611 \|
	\| 1.3474 \| 5.17 \| 600 \| 0.4770 \| 0.4397 \|
	\| 0.9617 \| 6.9 \| 800 \| 0.3218 \| 0.3343 \|
	\| 0.9058 \| 8.62 \| 1000 \| 0.2741 \| 0.2768 \|
	\| 0.9712 \| 10.34 \| 1200 \| 0.2619 \| 0.2505 \|
	\| 0.6908 \| 12.07 \| 1400 \| 0.2288 \| 0.2243 \|
	\| 0.745 \| 13.79 \| 1600 \| 0.2288 \| 0.2095 \|
	\| 0.7742 \| 15.52 \| 1800 \| 0.2289 \| 0.1979 \|
	\| 0.7231 \| 17.24 \| 2000 \| 0.2198 \| 0.1940 \|
	\| 0.6475 \| 18.97 \| 2200 \| 0.2180 \| 0.1992 \|
	\| 0.6421 \| 20.69 \| 2400 \| 0.2133 \| 0.1741 \|
	\| 0.5925 \| 22.41 \| 2600 \| 0.1998 \| 0.1747 \|
	\| 0.5608 \| 24.14 \| 2800 \| 0.2212 \| 0.1950 \|
	\| 0.5315 \| 25.86 \| 3000 \| 0.2187 \| 0.1624 \|
	\| 0.5362 \| 27.59 \| 3200 \| 0.2057 \| 0.1718 \|
	\| 0.563 \| 29.31 \| 3400 \| 0.2090 \| 0.1613 \|
	\| 0.4218 \| 31.03 \| 3600 \| 0.2126 \| 0.1531 \|
	\| 0.3826 \| 32.76 \| 3800 \| 0.2084 \| 0.1538 \|
	\| 0.356 \| 34.48 \| 4000 \| 0.2115 \| 0.1612 \|
	\| 0.2966 \| 36.21 \| 4200 \| 0.2093 \| 0.1536 \|
	\| 0.3377 \| 37.93 \| 4400 \| 0.2061 \| 0.1527 \|
	\| 0.321 \| 39.66 \| 4600 \| 0.2121 \| 0.1463 \|
	\| 0.2942 \| 41.38 \| 4800 \| 0.2158 \| 0.1441 \|
	\| 0.2931 \| 43.1 \| 5000 \| 0.2173 \| 0.1446 \|
	\| 0.2346 \| 44.83 \| 5200 \| 0.2152 \| 0.1436 \|
	\| 0.2543 \| 46.55 \| 5400 \| 0.2066 \| 0.1445 \|
	\| 0.2385 \| 48.28 \| 5600 \| 0.2108 \| 0.1432 \|
	\| 0.2726 \| 50.0 \| 5800 \| 0.2131 \| 0.1429 \|



	### Framework versions

	- Transformers 4.28.1
	- Pytorch 2.0.0+cu117
	- Datasets 2.11.0
	- Tokenizers 0.13.3