Edit model card

sil-ai logo

wav2vec2-large-xls-r-300m-kaqchikel-with-bloom

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on a collection of audio from Deditos videos in Kaqchikel provided by Viña Studios and Kaqchikel audio from audiobooks on Bloom Library. It achieves the following results on the evaluation set:

  • Loss: 0.6700
  • Cer: 0.0854
  • Wer: 0.3069

Model description

This model is a baseline model finetuned from XLS-R 300m. Users should refer to the original model for tutorials on using a trained model for inference.

Intended uses & limitations

Users of this model should abide by the UN Declarations on the Rights of Indigenous Peoples.

This model is released under the MIT license and no guarantees are made regarding the performance of the model is specific situations.

Training and evaluation data

Training, Validation, and Test datasets were generated from the same corpus, ensuring that no duplicate files were used.

Training procedure

Standard finetuning of XLS-R was used based on the examples in the Hugging Face Transformers Github

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Cer Wer
11.1557 1.84 100 4.2251 1.0 1.0
3.7231 3.7 200 3.5794 1.0 1.0
3.3076 5.55 300 3.4362 1.0 1.0
3.2495 7.4 400 3.2553 1.0 1.0
3.2076 9.26 500 3.2932 1.0 1.0
3.1304 11.11 600 3.1100 1.0 1.0
2.899 12.95 700 2.4021 0.8477 1.0
2.2875 14.81 800 1.5473 0.4790 0.9984
1.7605 16.66 900 1.1034 0.3061 0.9192
1.3802 18.51 1000 0.9422 0.2386 0.8530
1.0989 20.37 1100 0.7429 0.1667 0.6042
0.857 22.22 1200 0.7490 0.1499 0.5751
0.6899 24.07 1300 0.6376 0.1286 0.4798
0.5927 25.92 1400 0.6887 0.1232 0.4443
0.4699 27.77 1500 0.6341 0.1184 0.4378
0.4029 29.62 1600 0.6341 0.1103 0.4216
0.3492 31.48 1700 0.6709 0.1121 0.4120
0.3019 33.33 1800 0.7665 0.1097 0.4136
0.2681 35.18 1900 0.6671 0.1085 0.4120
0.2491 37.04 2000 0.7049 0.1010 0.3748
0.2108 38.88 2100 0.6699 0.1064 0.3974
0.2146 40.73 2200 0.7037 0.1046 0.3780
0.1854 42.59 2300 0.6970 0.1055 0.4006
0.1693 44.44 2400 0.6593 0.0980 0.3764
0.1628 46.29 2500 0.7162 0.0998 0.3764
0.156 48.15 2600 0.6445 0.0998 0.3829
0.1439 49.99 2700 0.6437 0.1004 0.3845
0.1292 51.84 2800 0.6471 0.0944 0.3457
0.1287 53.7 2900 0.6411 0.0923 0.3538
0.1186 55.55 3000 0.6754 0.0992 0.3813
0.1175 57.4 3100 0.6741 0.0953 0.3538
0.1082 59.26 3200 0.6949 0.0977 0.3619
0.105 61.11 3300 0.6919 0.0983 0.3683
0.1048 62.95 3400 0.6802 0.0950 0.3425
0.092 64.81 3500 0.6830 0.0962 0.3263
0.0904 66.66 3600 0.6993 0.0971 0.3554
0.0914 68.51 3700 0.6932 0.0995 0.3554
0.0823 70.37 3800 0.6742 0.0950 0.3409
0.0799 72.22 3900 0.6852 0.0917 0.3279
0.0767 74.07 4000 0.6684 0.0929 0.3489
0.0736 75.92 4100 0.6611 0.0923 0.3393
0.0708 77.77 4200 0.7123 0.0944 0.3393
0.0661 79.62 4300 0.6577 0.0899 0.3247
0.0651 81.48 4400 0.6671 0.0869 0.3150
0.0607 83.33 4500 0.6980 0.0893 0.3231
0.0552 85.18 4600 0.6947 0.0884 0.3183
0.0574 87.04 4700 0.6652 0.0899 0.3183
0.0503 88.88 4800 0.6798 0.0863 0.3053
0.0479 90.73 4900 0.6690 0.0884 0.3166
0.0483 92.59 5000 0.6789 0.0872 0.3069
0.0437 94.44 5100 0.6758 0.0875 0.3069
0.0458 96.29 5200 0.6662 0.0884 0.3102
0.0434 98.15 5300 0.6699 0.0881 0.3069
0.0449 99.99 5400 0.6700 0.0854 0.3069

Framework versions

  • Transformers 4.11.3
  • Pytorch 1.10.0+cu113
  • Datasets 2.2.1
  • Tokenizers 0.10.3
Downloads last month
1