Edit model card

wav2vec2-large-xls-r-300m-hi-wx1

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 -HI dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6552
  • Wer: 0.3200

Evaluation Commands

  1. To evaluate on mozilla-foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hi-wx1 --dataset mozilla-foundation/common_voice_7_0 --config hi --split test --log_outputs

  1. To evaluate on speech-recognition-community-v2/dev_data

NA

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00024
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1800
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
12.2663 1.36 200 5.9245 1.0
4.1856 2.72 400 3.4968 1.0
3.3908 4.08 600 2.9970 1.0
1.5444 5.44 800 0.9071 0.6139
0.7237 6.8 1000 0.6508 0.4862
0.5323 8.16 1200 0.6217 0.4647
0.4426 9.52 1400 0.5785 0.4288
0.3933 10.88 1600 0.5935 0.4217
0.3532 12.24 1800 0.6358 0.4465
0.3319 13.6 2000 0.5789 0.4118
0.2877 14.96 2200 0.6163 0.4056
0.2663 16.33 2400 0.6176 0.3893
0.2511 17.68 2600 0.6065 0.3999
0.2275 19.05 2800 0.6183 0.3842
0.2098 20.41 3000 0.6486 0.3864
0.1943 21.77 3200 0.6365 0.3885
0.1877 23.13 3400 0.6013 0.3677
0.1679 24.49 3600 0.6451 0.3795
0.1667 25.85 3800 0.6410 0.3635
0.1514 27.21 4000 0.6000 0.3577
0.1453 28.57 4200 0.6020 0.3518
0.134 29.93 4400 0.6531 0.3517
0.1354 31.29 4600 0.6874 0.3578
0.1224 32.65 4800 0.6519 0.3492
0.1199 34.01 5000 0.6553 0.3490
0.1077 35.37 5200 0.6621 0.3429
0.0997 36.73 5400 0.6641 0.3413
0.0964 38.09 5600 0.6722 0.3385
0.0931 39.45 5800 0.6365 0.3363
0.0944 40.81 6000 0.6454 0.3326
0.0862 42.18 6200 0.6497 0.3256
0.0848 43.54 6400 0.6599 0.3226
0.0793 44.89 6600 0.6625 0.3232
0.076 46.26 6800 0.6463 0.3186
0.0749 47.62 7000 0.6559 0.3225
0.0663 48.98 7200 0.6552 0.3200

Framework versions

  • Transformers 4.16.2
  • Pytorch 1.10.0+cu111
  • Datasets 1.18.3
  • Tokenizers 0.11.0
Downloads last month
8

Dataset used to train DrishtiSharma/wav2vec2-large-xls-r-300m-hi-wx1

Evaluation results