README.md · ivanlau/wav2vec2-large-xls-r-300m-cantonese at main

metadata

language:
  - zh
license: apache-2.0
tags:
  - automatic-speech-recognition
  - generated_from_trainer
  - hf-asr-leaderboard
  - mozilla-foundation/common_voice_8_0
  - robust-speech-event
  - zh-HK
datasets:
  - mozilla-foundation/common_voice_8_0
model-index:
  - name: XLS-R-300M - Chinese_HongKong (Cantonese)
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8
          type: mozilla-foundation/common_voice_8_0
          args: zh-hk
        metrics:
          - name: Test WER
            type: wer
            value: 0.8111349803079126
          - name: Test CER
            type: cer
            value: 0.21962250882996914
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Dev Data
          type: speech-recognition-community-v2/dev_data
          args: zh-hk
        metrics:
          - name: Test WER
            type: wer
            value: 1
          - name: Test CER
            type: cer
            value: 0.6160564326503191
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8
          type: mozilla-foundation/common_voice_8_0
          args: zh-HK
        metrics:
          - name: Test WER with LM
            type: wer
            value: 0.8055853920515574
          - name: Test CER with LM
            type: cer
            value: 0.21578686612008757
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Dev Data
          type: speech-recognition-community-v2/dev_data
          args: zh-HK
        metrics:
          - name: Test WER with LM
            type: wer
            value: 1.0012453300124533
          - name: Test CER with LM
            type: cer
            value: 0.6153006382264025
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Test Data
          type: speech-recognition-community-v2/eval_data
          args: zh-HK
        metrics:
          - name: Test CER
            type: cer
            value: 61.55

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - ZH-HK dataset. It achieves the following results on the evaluation set:

Loss: 1.4848
Wer: 0.8004

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	1.0	183	47.8442	1.0
No log	2.0	366	6.3109	1.0
41.8902	3.0	549	6.2392	1.0
41.8902	4.0	732	5.9739	1.1123
41.8902	5.0	915	4.9014	1.9474
5.5817	6.0	1098	3.9892	1.0188
5.5817	7.0	1281	3.5080	1.0104
5.5817	8.0	1464	3.0797	0.9905
3.5579	9.0	1647	2.8111	0.9836
3.5579	10.0	1830	2.6726	0.9815
2.7771	11.0	2013	2.7177	0.9809
2.7771	12.0	2196	2.3582	0.9692
2.7771	13.0	2379	2.1708	0.9757
2.3488	14.0	2562	2.0491	0.9526
2.3488	15.0	2745	1.8518	0.9378
2.3488	16.0	2928	1.6845	0.9286
1.7859	17.0	3111	1.6412	0.9280
1.7859	18.0	3294	1.5488	0.9035
1.7859	19.0	3477	1.4546	0.9010
1.3898	20.0	3660	1.5147	0.9201
1.3898	21.0	3843	1.4467	0.8959
1.1291	22.0	4026	1.4743	0.9035
1.1291	23.0	4209	1.3827	0.8762
1.1291	24.0	4392	1.3437	0.8792
0.8993	25.0	4575	1.2895	0.8577
0.8993	26.0	4758	1.2928	0.8558
0.8993	27.0	4941	1.2947	0.9163
0.6298	28.0	5124	1.3151	0.8738
0.6298	29.0	5307	1.2972	0.8514
0.6298	30.0	5490	1.3030	0.8432
0.4757	31.0	5673	1.3264	0.8364
0.4757	32.0	5856	1.3131	0.8421
0.3735	33.0	6039	1.3457	0.8588
0.3735	34.0	6222	1.3450	0.8473
0.3735	35.0	6405	1.3452	0.9218
0.3253	36.0	6588	1.3754	0.8397
0.3253	37.0	6771	1.3554	0.8353
0.3253	38.0	6954	1.3532	0.8312
0.2816	39.0	7137	1.3694	0.8345
0.2816	40.0	7320	1.3953	0.8296
0.2397	41.0	7503	1.3858	0.8293
0.2397	42.0	7686	1.3959	0.8402
0.2397	43.0	7869	1.4350	0.9318
0.2084	44.0	8052	1.4004	0.8806
0.2084	45.0	8235	1.3871	0.8255
0.2084	46.0	8418	1.4060	0.8252
0.1853	47.0	8601	1.3992	0.8501
0.1853	48.0	8784	1.4186	0.8252
0.1853	49.0	8967	1.4120	0.8165
0.1671	50.0	9150	1.4166	0.8214
0.1671	51.0	9333	1.4411	0.8501
0.1513	52.0	9516	1.4692	0.8394
0.1513	53.0	9699	1.4640	0.8391
0.1513	54.0	9882	1.4501	0.8419
0.133	55.0	10065	1.4134	0.8351
0.133	56.0	10248	1.4593	0.8405
0.133	57.0	10431	1.4560	0.8389
0.1198	58.0	10614	1.4734	0.8334
0.1198	59.0	10797	1.4649	0.8318
0.1198	60.0	10980	1.4659	0.8100
0.1109	61.0	11163	1.4784	0.8119
0.1109	62.0	11346	1.4938	0.8149
0.1063	63.0	11529	1.5050	0.8152
0.1063	64.0	11712	1.4773	0.8176
0.1063	65.0	11895	1.4836	0.8261
0.0966	66.0	12078	1.4979	0.8157
0.0966	67.0	12261	1.4603	0.8048
0.0966	68.0	12444	1.4803	0.8127
0.0867	69.0	12627	1.4974	0.8130
0.0867	70.0	12810	1.4721	0.8078
0.0867	71.0	12993	1.4644	0.8192
0.0827	72.0	13176	1.4835	0.8138
0.0827	73.0	13359	1.4934	0.8122
0.0734	74.0	13542	1.4951	0.8062
0.0734	75.0	13725	1.4908	0.8070
0.0734	76.0	13908	1.4876	0.8124
0.0664	77.0	14091	1.4934	0.8053
0.0664	78.0	14274	1.4603	0.8048
0.0664	79.0	14457	1.4732	0.8073
0.0602	80.0	14640	1.4925	0.8078
0.0602	81.0	14823	1.4812	0.8064
0.057	82.0	15006	1.4950	0.8013
0.057	83.0	15189	1.4785	0.8056
0.057	84.0	15372	1.4856	0.7993
0.0517	85.0	15555	1.4755	0.8034
0.0517	86.0	15738	1.4813	0.8034
0.0517	87.0	15921	1.4966	0.8048
0.0468	88.0	16104	1.4883	0.8002
0.0468	89.0	16287	1.4746	0.8023
0.0468	90.0	16470	1.4697	0.7974
0.0426	91.0	16653	1.4775	0.8004
0.0426	92.0	16836	1.4852	0.8023
0.0387	93.0	17019	1.4868	0.8004
0.0387	94.0	17202	1.4785	0.8021
0.0387	95.0	17385	1.4892	0.8015
0.0359	96.0	17568	1.4862	0.8018
0.0359	97.0	17751	1.4851	0.8007
0.0359	98.0	17934	1.4846	0.7999
0.0347	99.0	18117	1.4852	0.7993
0.0347	100.0	18300	1.4848	0.8004

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with split test

python eval.py --model_id ivanlau/wav2vec2-large-xls-r-300m-cantonese --dataset mozilla-foundation/common_voice_8_0 --config zh-HK --split test --log_outputs

To evaluate on speech-recognition-community-v2/dev_data

python eval.py --model_id ivanlau/wav2vec2-large-xls-r-300m-cantonese --dataset speech-recognition-community-v2/dev_data --config zh-HK --split validation --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3
Tokenizers 0.11.0