wav2vec2-large-xlsr53-zh-cn-subset20-colab

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0566
  • Wer: 0.9503
  • Cer: 0.3333

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 13
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 26
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 1.9 400 6.7551 1.0 1.0
34.7845 3.81 800 6.4563 1.0 1.0
6.4358 5.71 1200 4.2319 1.0074 0.7454
4.2052 7.62 1600 2.6538 1.0200 0.5562
2.3906 9.52 2000 2.3565 1.0063 0.5147
2.3906 11.43 2400 2.1287 0.9863 0.4822
1.93 13.33 2800 1.9585 0.9812 0.4528
1.6322 15.24 3200 1.8771 0.9937 0.4381
1.3629 17.14 3600 1.8405 0.9926 0.4242
1.166 19.05 4000 1.7674 0.9989 0.4140
1.166 20.95 4400 1.7879 0.9795 0.4047
0.9915 22.86 4800 1.7597 1.0126 0.4080
0.8517 24.76 5200 1.7726 0.9829 0.3966
0.7143 26.67 5600 1.7623 0.9732 0.3863
0.6267 28.57 6000 1.8164 0.9720 0.3863
0.6267 30.48 6400 1.8136 0.9680 0.3801
0.5389 32.38 6800 1.8696 0.9652 0.3812
0.4764 34.29 7200 1.8625 0.9663 0.3744
0.4095 36.19 7600 1.8868 0.9618 0.3683
0.3594 38.1 8000 1.8834 0.9623 0.3699
0.3594 40.0 8400 1.9155 0.9589 0.3670
0.3064 41.9 8800 1.9268 0.9652 0.3688
0.2825 43.81 9200 1.9527 0.9697 0.3674
0.2524 45.71 9600 1.9726 0.9686 0.3617
0.2272 47.62 10000 1.9594 0.9629 0.3619
0.2272 49.52 10400 1.9799 0.9635 0.3607
0.2042 51.43 10800 2.0175 0.9669 0.3582
0.1975 53.33 11200 2.0246 0.9589 0.3571
0.1827 55.24 11600 2.0535 0.9703 0.3600
0.1677 57.14 12000 2.0458 0.9583 0.3555
0.1677 59.05 12400 2.0893 0.9572 0.3583
0.1626 60.95 12800 2.0729 0.9600 0.3557
0.155 62.86 13200 2.0706 0.9572 0.3538
0.1456 64.76 13600 2.0761 0.9532 0.3553
0.1337 66.67 14000 2.0349 0.9589 0.3474
0.1337 68.57 14400 2.0844 0.9549 0.3484
0.1274 70.48 14800 2.0874 0.9578 0.3505
0.1198 72.38 15200 2.0813 0.9526 0.3473
0.1164 74.29 15600 2.0866 0.9498 0.3473
0.1105 76.19 16000 2.0688 0.9486 0.3421
0.1105 78.1 16400 2.0854 0.9498 0.3431
0.1053 80.0 16800 2.0749 0.9503 0.3414
0.1 81.9 17200 2.0622 0.9543 0.3407
0.0977 83.81 17600 2.0678 0.9532 0.3396
0.0906 85.71 18000 2.0650 0.9515 0.3383
0.0906 87.62 18400 2.0631 0.9492 0.3378
0.0867 89.52 18800 2.0633 0.9521 0.3365
0.0836 91.43 19200 2.0606 0.9532 0.3346
0.0819 93.33 19600 2.0671 0.9538 0.3355
0.0768 95.24 20000 2.0661 0.9509 0.3338
0.0768 97.14 20400 2.0564 0.9498 0.3335
0.0752 99.05 20800 2.0566 0.9503 0.3333

Framework versions

  • Transformers 4.32.0.dev0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.3
  • Tokenizers 0.13.3
Downloads last month
28
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for scarlett623/wav2vec2-large-xlsr53-zh-cn-subset20-colab

Finetuned
(209)
this model

Dataset used to train scarlett623/wav2vec2-large-xlsr53-zh-cn-subset20-colab

Evaluation results