anton-l's picture
anton-l HF staff
Upload README.md
75ce1f7
metadata
language:
  - sr
license: apache-2.0
tags:
  - automatic-speech-recognition
  - mozilla-foundation/common_voice_8_0
  - generated_from_trainer
  - robust-speech-event
  - xlsr-fine-tuning-week
  - hf-asr-leaderboard
datasets:
  - mozilla-foundation/common_voice_8_0
  - name: Serbian comodoro Wav2Vec2 XLSR 300M CV8
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8
          type: mozilla-foundation/common_voice_8_0
          args: sr
        metrics:
          - name: Test WER
            type: wer
            value: 48.5
          - name: Test CER
            type: cer
            value: 18.4
model-index:
  - name: wav2vec2-xls-r-300m-sr-cv8
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8.0
          type: mozilla-foundation/common_voice_8_0
          args: sr
        metrics:
          - name: Test WER
            type: wer
            value: 48.53
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Dev Data
          type: speech-recognition-community-v2/dev_data
          args: sr
        metrics:
          - name: Test WER
            type: wer
            value: 97.43
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Test Data
          type: speech-recognition-community-v2/eval_data
          args: sr
        metrics:
          - name: Test WER
            type: wer
            value: 96.69

Serbian wav2vec2-xls-r-300m-sr-cv8

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7302
  • Wer: 0.4825
  • Cer: 0.1847

Evaluation on mozilla-foundation/common_voice_8_0 gave the following results:

  • WER: 0.48530097993467103
  • CER: 0.18413288165227845

Evaluation on speech-recognition-community-v2/dev_data gave the following results:

  • WER: 0.9718373107518604
  • CER: 0.8302740620263108

The model can be evaluated using the attached eval.py script:

python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sr-cv8 --dataset mozilla-foundation/common-voice_8_0 --split test --config sr

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 800
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
5.6536 15.0 1200 2.9744 1.0 1.0
2.7935 30.0 2400 1.6613 0.8998 0.4670
1.6538 45.0 3600 0.9248 0.6918 0.2699
1.2446 60.0 4800 0.9151 0.6452 0.2398
1.0766 75.0 6000 0.9110 0.5995 0.2207
0.9548 90.0 7200 1.0273 0.5921 0.2149
0.8919 105.0 8400 0.9929 0.5646 0.2117
0.8185 120.0 9600 1.0850 0.5483 0.2069
0.7692 135.0 10800 1.1001 0.5394 0.2055
0.7249 150.0 12000 1.1018 0.5380 0.1958
0.6786 165.0 13200 1.1344 0.5114 0.1941
0.6432 180.0 14400 1.1516 0.5054 0.1905
0.6009 195.0 15600 1.3149 0.5324 0.1991
0.5773 210.0 16800 1.2468 0.5124 0.1903
0.559 225.0 18000 1.2186 0.4956 0.1922
0.5298 240.0 19200 1.4483 0.5333 0.2085
0.5136 255.0 20400 1.2871 0.4802 0.1846
0.4824 270.0 21600 1.2891 0.4974 0.1885
0.4669 285.0 22800 1.3283 0.4942 0.1878
0.4511 300.0 24000 1.4502 0.5002 0.1994
0.4337 315.0 25200 1.4714 0.5035 0.1911
0.4221 330.0 26400 1.4971 0.5124 0.1962
0.3994 345.0 27600 1.4473 0.5007 0.1920
0.3892 360.0 28800 1.3904 0.4937 0.1887
0.373 375.0 30000 1.4971 0.4946 0.1902
0.3657 390.0 31200 1.4208 0.4900 0.1821
0.3559 405.0 32400 1.4648 0.4895 0.1835
0.3476 420.0 33600 1.4848 0.4946 0.1829
0.3276 435.0 34800 1.5597 0.4979 0.1873
0.3193 450.0 36000 1.7329 0.5040 0.1980
0.3078 465.0 37200 1.6379 0.4937 0.1882
0.3058 480.0 38400 1.5878 0.4942 0.1921
0.2987 495.0 39600 1.5590 0.4811 0.1846
0.2931 510.0 40800 1.6001 0.4825 0.1849
0.276 525.0 42000 1.7388 0.4942 0.1918
0.2702 540.0 43200 1.7037 0.4839 0.1866
0.2619 555.0 44400 1.6704 0.4755 0.1840
0.262 570.0 45600 1.6042 0.4751 0.1865
0.2528 585.0 46800 1.6402 0.4821 0.1865
0.2442 600.0 48000 1.6693 0.4886 0.1862
0.244 615.0 49200 1.6203 0.4765 0.1792
0.2388 630.0 50400 1.6829 0.4830 0.1828
0.2362 645.0 51600 1.8100 0.4928 0.1888
0.2224 660.0 52800 1.7746 0.4932 0.1899
0.2218 675.0 54000 1.7752 0.4946 0.1901
0.2201 690.0 55200 1.6775 0.4788 0.1844
0.2147 705.0 56400 1.7085 0.4844 0.1851
0.2103 720.0 57600 1.7624 0.4848 0.1864
0.2101 735.0 58800 1.7213 0.4783 0.1835
0.1983 750.0 60000 1.7452 0.4848 0.1856
0.2015 765.0 61200 1.7525 0.4872 0.1869
0.1969 780.0 62400 1.7443 0.4844 0.1852
0.2043 795.0 63600 1.7302 0.4825 0.1847

Framework versions

  • Transformers 4.16.2
  • Pytorch 1.10.1+cu102
  • Datasets 1.18.3
  • Tokenizers 0.11.0