greenw0lf's picture
Update README.md
eada0e6
metadata
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - common_voice_8_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-2b-frisian-cv-8
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: validation
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.040494215112126836
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: test
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.04223876282699812

wav2vec2-large-xls-r-2b-frisian-cv-8

This model is a fine-tuned version of facebook/wav2vec2-xls-r-2b on the common_voice_8_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0465
  • Wer: 0.0405

And on the test set:

  • Wer: 0.0422

Model description

This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 7 where I use as training set all validated data (~ 50 hours) except the test and evaluation sets (~ 4.5 hours each). The number of training hours adds up to 41 hours of Frisian speech. This varies from experiment 2 because I fine-tune on the 2B parameters version of XLS-R.

Intended uses & limitations

The intended use is for recognizing Frisian speech.

Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

Training and evaluation data

The evaluation split used is the one available in the Common Voice 8.0 Frisian subset. The train split corresponds to all of the validated data except for the recordings found in the evaluation and test splits.

Training procedure

The script used for training this model can be found in this GitHub repository: link.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
6.3316 0.21 400 2.9773 1.0
2.7465 0.43 800 1.2564 0.9352
1.4576 0.64 1200 0.6275 0.5809
1.2245 0.86 1600 0.4438 0.4244
0.9928 1.07 2000 0.3058 0.3247
0.8768 1.29 2400 0.2656 0.2618
0.8686 1.5 2800 0.2155 0.2289
0.8325 1.72 3200 0.1924 0.2016
0.8495 1.93 3600 0.1748 0.1853
0.7069 2.14 4000 0.1792 0.1682
0.7381 2.36 4400 0.1540 0.1524
0.6648 2.57 4800 0.1397 0.1477
0.7471 2.79 5200 0.1372 0.1389
0.7219 3.0 5600 0.1296 0.1308
0.5894 3.22 6000 0.1167 0.1287
0.585 3.43 6400 0.1194 0.1264
0.5486 3.65 6800 0.1159 0.1248
0.5001 3.86 7200 0.1107 0.1160
0.4838 4.08 7600 0.1079 0.1212
0.4213 4.29 8000 0.1065 0.1145
0.4493 4.5 8400 0.0998 0.1098
0.4003 4.72 8800 0.0975 0.1027
0.4034 4.93 9200 0.0947 0.1023
0.3699 5.15 9600 0.0927 0.1006
0.3748 5.36 10000 0.0955 0.0994
0.3681 5.58 10400 0.0923 0.0952
0.3416 5.79 10800 0.0902 0.0968
0.3594 6.01 11200 0.0848 0.0935
0.3303 6.22 11600 0.0889 0.0921
0.3205 6.43 12000 0.0843 0.0893
0.3267 6.65 12400 0.0884 0.0882
0.33 6.86 12800 0.0859 0.0936
0.3023 7.08 13200 0.0830 0.0851
0.3057 7.29 13600 0.0826 0.0860
0.3007 7.51 14000 0.0841 0.0836
0.2981 7.72 14400 0.0790 0.0817
0.282 7.94 14800 0.0761 0.0779
0.2758 8.15 15200 0.0767 0.0776
0.275 8.36 15600 0.0788 0.0781
0.283 8.58 16000 0.0728 0.0775
0.2684 8.79 16400 0.0722 0.0742
0.2701 9.01 16800 0.0742 0.0720
0.248 9.22 17200 0.0711 0.0729
0.2467 9.44 17600 0.0698 0.0711
0.2588 9.65 18000 0.0688 0.0710
0.2566 9.87 18400 0.0699 0.0708
0.2425 10.08 18800 0.0699 0.0683
0.2292 10.29 19200 0.0697 0.0662
0.2317 10.51 19600 0.0670 0.0663
0.2381 10.72 20000 0.0649 0.0648
0.2281 10.94 20400 0.0619 0.0621
0.2329 11.15 20800 0.0648 0.0627
0.2197 11.37 21200 0.0630 0.0632
0.2406 11.58 21600 0.0611 0.0609
0.2221 11.8 22000 0.0621 0.0601
0.2316 12.01 22400 0.0637 0.0596
0.202 12.23 22800 0.0622 0.0592
0.2071 12.44 23200 0.0603 0.0589
0.2119 12.65 23600 0.0589 0.0581
0.2072 12.87 24000 0.0586 0.0588
0.1948 13.08 24400 0.0576 0.0562
0.1967 13.3 24800 0.0573 0.0543
0.1981 13.51 25200 0.0582 0.0567
0.1869 13.73 25600 0.0550 0.0533
0.1929 13.94 26000 0.0530 0.0540
0.1837 14.16 26400 0.0550 0.0519
0.1823 14.37 26800 0.0535 0.0521
0.1756 14.58 27200 0.0552 0.0515
0.1769 14.8 27600 0.0553 0.0502
0.1769 15.01 28000 0.0516 0.0493
0.1781 15.23 28400 0.0519 0.0485
0.1763 15.44 28800 0.0511 0.0482
0.1705 15.66 29200 0.0513 0.0471
0.1696 15.87 29600 0.0484 0.0467
0.1668 16.09 30000 0.0492 0.0464
0.1635 16.3 30400 0.0492 0.0470
0.1597 16.51 30800 0.0505 0.0471
0.152 16.73 31200 0.0495 0.0471
0.1589 16.94 31600 0.0478 0.0456
0.1586 17.16 32000 0.0490 0.0441
0.1516 17.37 32400 0.0482 0.0448
0.1506 17.59 32800 0.0485 0.0439
0.1513 17.8 33200 0.0485 0.0439
0.1545 18.02 33600 0.0479 0.0432
0.1472 18.23 34000 0.0479 0.0428
0.148 18.45 34400 0.0475 0.0424
0.1446 18.66 34800 0.0477 0.0420
0.1413 18.87 35200 0.0466 0.0416
0.1398 19.09 35600 0.0477 0.0407
0.1431 19.3 36000 0.0466 0.0406
0.1437 19.52 36400 0.0467 0.0401
0.1393 19.73 36800 0.0468 0.0404
0.1416 19.95 37200 0.0465 0.0405

Framework versions

  • Transformers 4.28.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.11.0
  • Tokenizers 0.13.3