wav2vec2-large-xls-r-300m-myv-v1

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - MYV dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8537
  • Wer: 0.6160

Evaluation Commands

  1. To evaluate on mozilla-foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-myv-v1 --dataset mozilla-foundation/common_voice_8_0 --config myv --split test --log_outputs

  1. To evaluate on speech-recognition-community-v2/dev_data

Erzya language not found in speech-recognition-community-v2/dev_data!

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000222
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 150
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
19.453 1.92 50 16.4001 1.0
9.6875 3.85 100 5.4468 1.0
4.9988 5.77 150 4.3507 1.0
4.1148 7.69 200 3.6753 1.0
3.4922 9.62 250 3.3103 1.0
3.2443 11.54 300 3.1741 1.0
3.164 13.46 350 3.1346 1.0
3.0954 15.38 400 3.0428 1.0
3.0076 17.31 450 2.9137 1.0
2.6883 19.23 500 2.1476 0.9978
1.5124 21.15 550 0.8955 0.8225
0.8711 23.08 600 0.6948 0.7591
0.6695 25.0 650 0.6683 0.7636
0.5606 26.92 700 0.6821 0.7435
0.503 28.85 750 0.7220 0.7516
0.4528 30.77 800 0.6638 0.7324
0.4219 32.69 850 0.7120 0.7435
0.4109 34.62 900 0.7122 0.7511
0.3887 36.54 950 0.7179 0.7199
0.3895 38.46 1000 0.7322 0.7525
0.391 40.38 1050 0.6850 0.7364
0.3537 42.31 1100 0.7571 0.7279
0.3267 44.23 1150 0.7575 0.7257
0.3195 46.15 1200 0.7580 0.6998
0.2891 48.08 1250 0.7452 0.7101
0.294 50.0 1300 0.7316 0.6945
0.2854 51.92 1350 0.7241 0.6757
0.2801 53.85 1400 0.7532 0.6887
0.2502 55.77 1450 0.7587 0.6811
0.2427 57.69 1500 0.7231 0.6851
0.2311 59.62 1550 0.7288 0.6632
0.2176 61.54 1600 0.7711 0.6664
0.2117 63.46 1650 0.7914 0.6940
0.2114 65.38 1700 0.8065 0.6918
0.1913 67.31 1750 0.8372 0.6945
0.1897 69.23 1800 0.8051 0.6869
0.1865 71.15 1850 0.8076 0.6740
0.1844 73.08 1900 0.7935 0.6708
0.1757 75.0 1950 0.8015 0.6610
0.1636 76.92 2000 0.7614 0.6414
0.1637 78.85 2050 0.8123 0.6592
0.1599 80.77 2100 0.7907 0.6566
0.1498 82.69 2150 0.8641 0.6757
0.1545 84.62 2200 0.7438 0.6682
0.1433 86.54 2250 0.8014 0.6624
0.1427 88.46 2300 0.7758 0.6646
0.1423 90.38 2350 0.7741 0.6423
0.1298 92.31 2400 0.7938 0.6414
0.1111 94.23 2450 0.7976 0.6467
0.1243 96.15 2500 0.7916 0.6481
0.1215 98.08 2550 0.7594 0.6392
0.113 100.0 2600 0.8236 0.6392
0.1077 101.92 2650 0.7959 0.6347
0.0988 103.85 2700 0.8189 0.6392
0.0953 105.77 2750 0.8157 0.6414
0.0889 107.69 2800 0.7946 0.6369
0.0929 109.62 2850 0.8255 0.6360
0.0822 111.54 2900 0.8320 0.6334
0.086 113.46 2950 0.8539 0.6490
0.0825 115.38 3000 0.8438 0.6418
0.0727 117.31 3050 0.8568 0.6481
0.0717 119.23 3100 0.8447 0.6512
0.0815 121.15 3150 0.8470 0.6445
0.0689 123.08 3200 0.8264 0.6249
0.0726 125.0 3250 0.7981 0.6169
0.0648 126.92 3300 0.8237 0.6200
0.0632 128.85 3350 0.8416 0.6249
0.06 130.77 3400 0.8276 0.6173
0.0616 132.69 3450 0.8429 0.6209
0.0614 134.62 3500 0.8485 0.6271
0.0539 136.54 3550 0.8598 0.6218
0.0555 138.46 3600 0.8557 0.6169
0.0604 140.38 3650 0.8436 0.6186
0.0556 142.31 3700 0.8428 0.6178
0.051 144.23 3750 0.8440 0.6142
0.0526 146.15 3800 0.8566 0.6142
0.052 148.08 3850 0.8544 0.6178
0.0519 150.0 3900 0.8537 0.6160

Framework versions

  • Transformers 4.16.2
  • Pytorch 1.10.0+cu111
  • Datasets 1.18.2
  • Tokenizers 0.11.0
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train DrishtiSharma/wav2vec2-large-xls-r-300m-myv-v1

Spaces using DrishtiSharma/wav2vec2-large-xls-r-300m-myv-v1 2

Evaluation results