ivanlau's picture
update readme.md
660fe3e
metadata
language:
  - zh
license: apache-2.0
tags:
  - automatic-speech-recognition
  - generated_from_trainer
  - hf-asr-leaderboard
  - mozilla-foundation/common_voice_8_0
  - robust-speech-event
  - zh-HK
datasets:
  - mozilla-foundation/common_voice_8_0
model-index:
  - name: XLS-R-300M - Chinese_HongKong (Cantonese)
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8
          type: mozilla-foundation/common_voice_8_0
          args: zh-hk
        metrics:
          - name: Test WER
            type: wer
            value: 0.8111349803079126
          - name: Test CER
            type: cer
            value: 0.21962250882996914
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Dev Data
          type: speech-recognition-community-v2/dev_data
          args: zh-hk
        metrics:
          - name: Test WER
            type: wer
            value: 1
          - name: Test CER
            type: cer
            value: 0.6160564326503191
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8
          type: mozilla-foundation/common_voice_8_0
          args: zh-HK
        metrics:
          - name: Test WER with LM
            type: wer
            value: 0.8055853920515574
          - name: Test CER with LM
            type: cer
            value: 0.21578686612008757
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Dev Data
          type: speech-recognition-community-v2/dev_data
          args: zh-HK
        metrics:
          - name: Test WER with LM
            type: wer
            value: 1.0012453300124533
          - name: Test CER with LM
            type: cer
            value: 0.6153006382264025
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Test Data
          type: speech-recognition-community-v2/eval_data
          args: zh-HK
        metrics:
          - name: Test CER
            type: cer
            value: 61.55

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - ZH-HK dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4848
  • Wer: 0.8004

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
No log 1.0 183 47.8442 1.0
No log 2.0 366 6.3109 1.0
41.8902 3.0 549 6.2392 1.0
41.8902 4.0 732 5.9739 1.1123
41.8902 5.0 915 4.9014 1.9474
5.5817 6.0 1098 3.9892 1.0188
5.5817 7.0 1281 3.5080 1.0104
5.5817 8.0 1464 3.0797 0.9905
3.5579 9.0 1647 2.8111 0.9836
3.5579 10.0 1830 2.6726 0.9815
2.7771 11.0 2013 2.7177 0.9809
2.7771 12.0 2196 2.3582 0.9692
2.7771 13.0 2379 2.1708 0.9757
2.3488 14.0 2562 2.0491 0.9526
2.3488 15.0 2745 1.8518 0.9378
2.3488 16.0 2928 1.6845 0.9286
1.7859 17.0 3111 1.6412 0.9280
1.7859 18.0 3294 1.5488 0.9035
1.7859 19.0 3477 1.4546 0.9010
1.3898 20.0 3660 1.5147 0.9201
1.3898 21.0 3843 1.4467 0.8959
1.1291 22.0 4026 1.4743 0.9035
1.1291 23.0 4209 1.3827 0.8762
1.1291 24.0 4392 1.3437 0.8792
0.8993 25.0 4575 1.2895 0.8577
0.8993 26.0 4758 1.2928 0.8558
0.8993 27.0 4941 1.2947 0.9163
0.6298 28.0 5124 1.3151 0.8738
0.6298 29.0 5307 1.2972 0.8514
0.6298 30.0 5490 1.3030 0.8432
0.4757 31.0 5673 1.3264 0.8364
0.4757 32.0 5856 1.3131 0.8421
0.3735 33.0 6039 1.3457 0.8588
0.3735 34.0 6222 1.3450 0.8473
0.3735 35.0 6405 1.3452 0.9218
0.3253 36.0 6588 1.3754 0.8397
0.3253 37.0 6771 1.3554 0.8353
0.3253 38.0 6954 1.3532 0.8312
0.2816 39.0 7137 1.3694 0.8345
0.2816 40.0 7320 1.3953 0.8296
0.2397 41.0 7503 1.3858 0.8293
0.2397 42.0 7686 1.3959 0.8402
0.2397 43.0 7869 1.4350 0.9318
0.2084 44.0 8052 1.4004 0.8806
0.2084 45.0 8235 1.3871 0.8255
0.2084 46.0 8418 1.4060 0.8252
0.1853 47.0 8601 1.3992 0.8501
0.1853 48.0 8784 1.4186 0.8252
0.1853 49.0 8967 1.4120 0.8165
0.1671 50.0 9150 1.4166 0.8214
0.1671 51.0 9333 1.4411 0.8501
0.1513 52.0 9516 1.4692 0.8394
0.1513 53.0 9699 1.4640 0.8391
0.1513 54.0 9882 1.4501 0.8419
0.133 55.0 10065 1.4134 0.8351
0.133 56.0 10248 1.4593 0.8405
0.133 57.0 10431 1.4560 0.8389
0.1198 58.0 10614 1.4734 0.8334
0.1198 59.0 10797 1.4649 0.8318
0.1198 60.0 10980 1.4659 0.8100
0.1109 61.0 11163 1.4784 0.8119
0.1109 62.0 11346 1.4938 0.8149
0.1063 63.0 11529 1.5050 0.8152
0.1063 64.0 11712 1.4773 0.8176
0.1063 65.0 11895 1.4836 0.8261
0.0966 66.0 12078 1.4979 0.8157
0.0966 67.0 12261 1.4603 0.8048
0.0966 68.0 12444 1.4803 0.8127
0.0867 69.0 12627 1.4974 0.8130
0.0867 70.0 12810 1.4721 0.8078
0.0867 71.0 12993 1.4644 0.8192
0.0827 72.0 13176 1.4835 0.8138
0.0827 73.0 13359 1.4934 0.8122
0.0734 74.0 13542 1.4951 0.8062
0.0734 75.0 13725 1.4908 0.8070
0.0734 76.0 13908 1.4876 0.8124
0.0664 77.0 14091 1.4934 0.8053
0.0664 78.0 14274 1.4603 0.8048
0.0664 79.0 14457 1.4732 0.8073
0.0602 80.0 14640 1.4925 0.8078
0.0602 81.0 14823 1.4812 0.8064
0.057 82.0 15006 1.4950 0.8013
0.057 83.0 15189 1.4785 0.8056
0.057 84.0 15372 1.4856 0.7993
0.0517 85.0 15555 1.4755 0.8034
0.0517 86.0 15738 1.4813 0.8034
0.0517 87.0 15921 1.4966 0.8048
0.0468 88.0 16104 1.4883 0.8002
0.0468 89.0 16287 1.4746 0.8023
0.0468 90.0 16470 1.4697 0.7974
0.0426 91.0 16653 1.4775 0.8004
0.0426 92.0 16836 1.4852 0.8023
0.0387 93.0 17019 1.4868 0.8004
0.0387 94.0 17202 1.4785 0.8021
0.0387 95.0 17385 1.4892 0.8015
0.0359 96.0 17568 1.4862 0.8018
0.0359 97.0 17751 1.4851 0.8007
0.0359 98.0 17934 1.4846 0.7999
0.0347 99.0 18117 1.4852 0.7993
0.0347 100.0 18300 1.4848 0.8004

Evaluation Commands

  1. To evaluate on mozilla-foundation/common_voice_8_0 with split test
python eval.py --model_id ivanlau/wav2vec2-large-xls-r-300m-cantonese --dataset mozilla-foundation/common_voice_8_0 --config zh-HK --split test --log_outputs
  1. To evaluate on speech-recognition-community-v2/dev_data
python eval.py --model_id ivanlau/wav2vec2-large-xls-r-300m-cantonese --dataset speech-recognition-community-v2/dev_data --config zh-HK --split validation --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs

Framework versions

  • Transformers 4.17.0.dev0
  • Pytorch 1.10.2+cu102
  • Datasets 1.18.3
  • Tokenizers 0.11.0