Edit model card

wav2vec2-large-xlsr53-zh-cn-subset-colab

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3992
  • Wer: 0.9395
  • Cer: 0.3184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 13
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 26
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 1.9 400 33.6533 1.0 1.0
70.5767 3.81 800 6.8140 1.0 1.0
7.1379 5.71 1200 6.5163 1.0 1.0
6.4771 7.62 1600 6.4602 1.0 1.0
6.3627 9.52 2000 6.3406 1.0 0.9700
6.3627 11.43 2400 6.1021 1.0 0.9678
6.1201 13.33 2800 5.1523 1.0 0.8385
5.3531 15.24 3200 4.2224 1.0 0.7084
4.1733 17.14 3600 3.6981 1.0 0.6332
3.5472 19.05 4000 3.2708 0.9994 0.5827
3.5472 20.95 4400 2.9629 0.9989 0.5510
3.0668 22.86 4800 2.7122 0.9943 0.5165
2.7248 24.76 5200 2.5171 0.9914 0.4976
2.4609 26.67 5600 2.3538 0.9897 0.4759
2.2323 28.57 6000 2.2112 0.9874 0.4555
2.2323 30.48 6400 2.0850 0.9834 0.4370
2.0438 32.38 6800 1.9982 0.9806 0.4261
1.8837 34.29 7200 1.9179 0.9766 0.4137
1.7646 36.19 7600 1.8278 0.9766 0.4030
1.6469 38.1 8000 1.7627 0.9755 0.3937
1.6469 40.0 8400 1.7063 0.9709 0.3853
1.5422 41.9 8800 1.6649 0.9663 0.3787
1.4561 43.81 9200 1.6336 0.9697 0.3714
1.3842 45.71 9600 1.5943 0.9606 0.3647
1.3164 47.62 10000 1.5681 0.9669 0.3621
1.3164 49.52 10400 1.5535 0.9600 0.3582
1.2654 51.43 10800 1.5354 0.9538 0.3544
1.2186 53.33 11200 1.5003 0.9555 0.3482
1.1781 55.24 11600 1.4979 0.9572 0.3473
1.1344 57.14 12000 1.4820 0.9549 0.3453
1.1344 59.05 12400 1.4707 0.9509 0.3396
1.0965 60.95 12800 1.4657 0.9509 0.3384
1.0637 62.86 13200 1.4610 0.9509 0.3371
1.0306 64.76 13600 1.4461 0.9509 0.3361
1.0014 66.67 14000 1.4437 0.9503 0.3328
1.0014 68.57 14400 1.4334 0.9463 0.3304
0.9758 70.48 14800 1.4267 0.9429 0.3295
0.9486 72.38 15200 1.4250 0.9469 0.3269
0.933 74.29 15600 1.4214 0.9441 0.3273
0.9121 76.19 16000 1.4161 0.9441 0.3267
0.9121 78.1 16400 1.4137 0.9446 0.3268
0.9001 80.0 16800 1.4216 0.9446 0.3253
0.8789 81.9 17200 1.4164 0.9435 0.3264
0.8659 83.81 17600 1.3996 0.9424 0.3216
0.8471 85.71 18000 1.4079 0.9458 0.3226
0.8471 87.62 18400 1.4042 0.9412 0.3214
0.8387 89.52 18800 1.4073 0.9424 0.3214
0.8299 91.43 19200 1.4005 0.9418 0.3192
0.8257 93.33 19600 1.4040 0.9406 0.3200
0.813 95.24 20000 1.4012 0.9412 0.3184
0.813 97.14 20400 1.4011 0.9389 0.3183
0.8062 99.05 20800 1.3992 0.9395 0.3184

Framework versions

  • Transformers 4.32.0.dev0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month
8

Finetuned from

Dataset used to train scarlett623/wav2vec2-large-xlsr53-zh-cn-subset-colab

Evaluation results