metadata

license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - common_voice_8_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-2b-frisian-cv-8
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: validation
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.040494215112126836
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: test
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.04223876282699812

wav2vec2-large-xls-r-2b-frisian-cv-8

This model is a fine-tuned version of facebook/wav2vec2-xls-r-2b on the common_voice_8_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.0465
Wer: 0.0405

And on the test set:

Wer: 0.0422

Model description

This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 7 where I use as training set all validated data (~ 50 hours) except the test and evaluation sets (~ 4.5 hours each). The number of training hours adds up to 41 hours of Frisian speech. This varies from experiment 2 because I fine-tune on the 2B parameters version of XLS-R.

Intended uses & limitations

The intended use is for recognizing Frisian speech.

Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

Training and evaluation data

The evaluation split used is the one available in the Common Voice 8.0 Frisian subset. The train split corresponds to all of the validated data except for the recordings found in the evaluation and test splits.

Training procedure

The script used for training this model can be found in this GitHub repository: link.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
6.3316	0.21	400	2.9773	1.0
2.7465	0.43	800	1.2564	0.9352
1.4576	0.64	1200	0.6275	0.5809
1.2245	0.86	1600	0.4438	0.4244
0.9928	1.07	2000	0.3058	0.3247
0.8768	1.29	2400	0.2656	0.2618
0.8686	1.5	2800	0.2155	0.2289
0.8325	1.72	3200	0.1924	0.2016
0.8495	1.93	3600	0.1748	0.1853
0.7069	2.14	4000	0.1792	0.1682
0.7381	2.36	4400	0.1540	0.1524
0.6648	2.57	4800	0.1397	0.1477
0.7471	2.79	5200	0.1372	0.1389
0.7219	3.0	5600	0.1296	0.1308
0.5894	3.22	6000	0.1167	0.1287
0.585	3.43	6400	0.1194	0.1264
0.5486	3.65	6800	0.1159	0.1248
0.5001	3.86	7200	0.1107	0.1160
0.4838	4.08	7600	0.1079	0.1212
0.4213	4.29	8000	0.1065	0.1145
0.4493	4.5	8400	0.0998	0.1098
0.4003	4.72	8800	0.0975	0.1027
0.4034	4.93	9200	0.0947	0.1023
0.3699	5.15	9600	0.0927	0.1006
0.3748	5.36	10000	0.0955	0.0994
0.3681	5.58	10400	0.0923	0.0952
0.3416	5.79	10800	0.0902	0.0968
0.3594	6.01	11200	0.0848	0.0935
0.3303	6.22	11600	0.0889	0.0921
0.3205	6.43	12000	0.0843	0.0893
0.3267	6.65	12400	0.0884	0.0882
0.33	6.86	12800	0.0859	0.0936
0.3023	7.08	13200	0.0830	0.0851
0.3057	7.29	13600	0.0826	0.0860
0.3007	7.51	14000	0.0841	0.0836
0.2981	7.72	14400	0.0790	0.0817
0.282	7.94	14800	0.0761	0.0779
0.2758	8.15	15200	0.0767	0.0776
0.275	8.36	15600	0.0788	0.0781
0.283	8.58	16000	0.0728	0.0775
0.2684	8.79	16400	0.0722	0.0742
0.2701	9.01	16800	0.0742	0.0720
0.248	9.22	17200	0.0711	0.0729
0.2467	9.44	17600	0.0698	0.0711
0.2588	9.65	18000	0.0688	0.0710
0.2566	9.87	18400	0.0699	0.0708
0.2425	10.08	18800	0.0699	0.0683
0.2292	10.29	19200	0.0697	0.0662
0.2317	10.51	19600	0.0670	0.0663
0.2381	10.72	20000	0.0649	0.0648
0.2281	10.94	20400	0.0619	0.0621
0.2329	11.15	20800	0.0648	0.0627
0.2197	11.37	21200	0.0630	0.0632
0.2406	11.58	21600	0.0611	0.0609
0.2221	11.8	22000	0.0621	0.0601
0.2316	12.01	22400	0.0637	0.0596
0.202	12.23	22800	0.0622	0.0592
0.2071	12.44	23200	0.0603	0.0589
0.2119	12.65	23600	0.0589	0.0581
0.2072	12.87	24000	0.0586	0.0588
0.1948	13.08	24400	0.0576	0.0562
0.1967	13.3	24800	0.0573	0.0543
0.1981	13.51	25200	0.0582	0.0567
0.1869	13.73	25600	0.0550	0.0533
0.1929	13.94	26000	0.0530	0.0540
0.1837	14.16	26400	0.0550	0.0519
0.1823	14.37	26800	0.0535	0.0521
0.1756	14.58	27200	0.0552	0.0515
0.1769	14.8	27600	0.0553	0.0502
0.1769	15.01	28000	0.0516	0.0493
0.1781	15.23	28400	0.0519	0.0485
0.1763	15.44	28800	0.0511	0.0482
0.1705	15.66	29200	0.0513	0.0471
0.1696	15.87	29600	0.0484	0.0467
0.1668	16.09	30000	0.0492	0.0464
0.1635	16.3	30400	0.0492	0.0470
0.1597	16.51	30800	0.0505	0.0471
0.152	16.73	31200	0.0495	0.0471
0.1589	16.94	31600	0.0478	0.0456
0.1586	17.16	32000	0.0490	0.0441
0.1516	17.37	32400	0.0482	0.0448
0.1506	17.59	32800	0.0485	0.0439
0.1513	17.8	33200	0.0485	0.0439
0.1545	18.02	33600	0.0479	0.0432
0.1472	18.23	34000	0.0479	0.0428
0.148	18.45	34400	0.0475	0.0424
0.1446	18.66	34800	0.0477	0.0420
0.1413	18.87	35200	0.0466	0.0416
0.1398	19.09	35600	0.0477	0.0407
0.1431	19.3	36000	0.0466	0.0406
0.1437	19.52	36400	0.0467	0.0401
0.1393	19.73	36800	0.0468	0.0404
0.1416	19.95	37200	0.0465	0.0405

Framework versions

Transformers 4.28.1
Pytorch 2.0.0+cu117
Datasets 2.11.0
Tokenizers 0.13.3