wav2vec2-large-xls-r-300m-br-d10

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - BR dataset. It achieves the following results on the evaluation set:

Loss: 1.1382
Wer: 0.4895

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-br-d10 --dataset mozilla-foundation/common_voice_8_0 --config br --split test --log_outputs

To evaluate on speech-recognition-community-v2/dev_data

Breton language isn't available in speech-recognition-community-v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 800
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
13.611	0.68	100	5.8492	1.0
3.8176	1.35	200	3.2181	1.0
3.0457	2.03	300	3.0902	1.0
2.2632	2.7	400	1.4882	0.9426
1.1965	3.38	500	1.1396	0.7950
0.984	4.05	600	1.0216	0.7583
0.8036	4.73	700	1.0258	0.7202
0.7061	5.41	800	0.9710	0.6820
0.689	6.08	900	0.9731	0.6488
0.6063	6.76	1000	0.9442	0.6569
0.5215	7.43	1100	1.0221	0.6671
0.4965	8.11	1200	0.9266	0.6181
0.4321	8.78	1300	0.9050	0.5991
0.3762	9.46	1400	0.9801	0.6134
0.3747	10.14	1500	0.9210	0.5747
0.3554	10.81	1600	0.9720	0.6051
0.3148	11.49	1700	0.9672	0.6099
0.3176	12.16	1800	1.0120	0.5966
0.2915	12.84	1900	0.9490	0.5653
0.2696	13.51	2000	0.9394	0.5819
0.2569	14.19	2100	1.0197	0.5667
0.2395	14.86	2200	0.9771	0.5608
0.2367	15.54	2300	1.0516	0.5678
0.2153	16.22	2400	1.0097	0.5679
0.2092	16.89	2500	1.0143	0.5430
0.2046	17.57	2600	1.0884	0.5631
0.1937	18.24	2700	1.0113	0.5648
0.1752	18.92	2800	1.0056	0.5470
0.164	19.59	2900	1.0340	0.5508
0.1723	20.27	3000	1.0743	0.5615
0.1535	20.95	3100	1.0495	0.5465
0.1432	21.62	3200	1.0390	0.5333
0.1561	22.3	3300	1.0798	0.5590
0.1384	22.97	3400	1.1716	0.5449
0.1359	23.65	3500	1.1154	0.5420
0.1356	24.32	3600	1.0883	0.5387
0.1355	25.0	3700	1.1114	0.5504
0.1158	25.68	3800	1.1171	0.5388
0.1166	26.35	3900	1.1335	0.5403
0.1165	27.03	4000	1.1374	0.5248
0.1064	27.7	4100	1.0336	0.5298
0.0987	28.38	4200	1.0407	0.5216
0.104	29.05	4300	1.1012	0.5350
0.0894	29.73	4400	1.1016	0.5310
0.0912	30.41	4500	1.1383	0.5302
0.0972	31.08	4600	1.0851	0.5214
0.0832	31.76	4700	1.1705	0.5311
0.0859	32.43	4800	1.0750	0.5192
0.0811	33.11	4900	1.0900	0.5180
0.0825	33.78	5000	1.1271	0.5196
0.07	34.46	5100	1.1289	0.5141
0.0689	35.14	5200	1.0960	0.5101
0.068	35.81	5300	1.1377	0.5050
0.0776	36.49	5400	1.0880	0.5194
0.0642	37.16	5500	1.1027	0.5076
0.0607	37.84	5600	1.1293	0.5119
0.0607	38.51	5700	1.1229	0.5103
0.0545	39.19	5800	1.1168	0.5103
0.0562	39.86	5900	1.1206	0.5073
0.0484	40.54	6000	1.1710	0.5019
0.0499	41.22	6100	1.1511	0.5100
0.0455	41.89	6200	1.1488	0.5009
0.0475	42.57	6300	1.1196	0.4944
0.0413	43.24	6400	1.1654	0.4996
0.0389	43.92	6500	1.0961	0.4930
0.0428	44.59	6600	1.0955	0.4938
0.039	45.27	6700	1.1323	0.4955
0.0352	45.95	6800	1.1040	0.4930
0.0334	46.62	6900	1.1382	0.4942
0.0338	47.3	7000	1.1264	0.4911
0.0307	47.97	7100	1.1216	0.4881
0.0286	48.65	7200	1.1459	0.4894
0.0348	49.32	7300	1.1419	0.4906
0.0329	50.0	7400	1.1382	0.4895

Framework versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

DrishtiSharma
/

wav2vec2-large-xls-r-300m-br-d10

wav2vec2-large-xls-r-300m-br-d10

Evaluation Commands

Training hyperparameters

Training results

Framework versions

Dataset used to train DrishtiSharma/wav2vec2-large-xls-r-300m-br-d10

Evaluation results