anton-l's picture
anton-l
HF staff
Upload README.md
75ce1f7
---
language:
- sr
license: apache-2.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_8_0
- generated_from_trainer
- robust-speech-event
- xlsr-fine-tuning-week
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_8_0
- name: Serbian comodoro Wav2Vec2 XLSR 300M CV8
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8
type: mozilla-foundation/common_voice_8_0
args: sr
metrics:
- name: Test WER
type: wer
value: 48.5
- name: Test CER
type: cer
value: 18.4
model-index:
- name: wav2vec2-xls-r-300m-sr-cv8
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8.0
type: mozilla-foundation/common_voice_8_0
args: sr
metrics:
- name: Test WER
type: wer
value: 48.53
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Robust Speech Event - Dev Data
type: speech-recognition-community-v2/dev_data
args: sr
metrics:
- name: Test WER
type: wer
value: 97.43
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Robust Speech Event - Test Data
type: speech-recognition-community-v2/eval_data
args: sr
metrics:
- name: Test WER
type: wer
value: 96.69
---
# Serbian wav2vec2-xls-r-300m-sr-cv8
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7302
- Wer: 0.4825
- Cer: 0.1847
Evaluation on mozilla-foundation/common_voice_8_0 gave the following results:
- WER: 0.48530097993467103
- CER: 0.18413288165227845
Evaluation on speech-recognition-community-v2/dev_data gave the following results:
- WER: 0.9718373107518604
- CER: 0.8302740620263108
The model can be evaluated using the attached `eval.py` script:
```
python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sr-cv8 --dataset mozilla-foundation/common-voice_8_0 --split test --config sr
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- num_epochs: 800
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
|:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|
| 5.6536 | 15.0 | 1200 | 2.9744 | 1.0 | 1.0 |
| 2.7935 | 30.0 | 2400 | 1.6613 | 0.8998 | 0.4670 |
| 1.6538 | 45.0 | 3600 | 0.9248 | 0.6918 | 0.2699 |
| 1.2446 | 60.0 | 4800 | 0.9151 | 0.6452 | 0.2398 |
| 1.0766 | 75.0 | 6000 | 0.9110 | 0.5995 | 0.2207 |
| 0.9548 | 90.0 | 7200 | 1.0273 | 0.5921 | 0.2149 |
| 0.8919 | 105.0 | 8400 | 0.9929 | 0.5646 | 0.2117 |
| 0.8185 | 120.0 | 9600 | 1.0850 | 0.5483 | 0.2069 |
| 0.7692 | 135.0 | 10800 | 1.1001 | 0.5394 | 0.2055 |
| 0.7249 | 150.0 | 12000 | 1.1018 | 0.5380 | 0.1958 |
| 0.6786 | 165.0 | 13200 | 1.1344 | 0.5114 | 0.1941 |
| 0.6432 | 180.0 | 14400 | 1.1516 | 0.5054 | 0.1905 |
| 0.6009 | 195.0 | 15600 | 1.3149 | 0.5324 | 0.1991 |
| 0.5773 | 210.0 | 16800 | 1.2468 | 0.5124 | 0.1903 |
| 0.559 | 225.0 | 18000 | 1.2186 | 0.4956 | 0.1922 |
| 0.5298 | 240.0 | 19200 | 1.4483 | 0.5333 | 0.2085 |
| 0.5136 | 255.0 | 20400 | 1.2871 | 0.4802 | 0.1846 |
| 0.4824 | 270.0 | 21600 | 1.2891 | 0.4974 | 0.1885 |
| 0.4669 | 285.0 | 22800 | 1.3283 | 0.4942 | 0.1878 |
| 0.4511 | 300.0 | 24000 | 1.4502 | 0.5002 | 0.1994 |
| 0.4337 | 315.0 | 25200 | 1.4714 | 0.5035 | 0.1911 |
| 0.4221 | 330.0 | 26400 | 1.4971 | 0.5124 | 0.1962 |
| 0.3994 | 345.0 | 27600 | 1.4473 | 0.5007 | 0.1920 |
| 0.3892 | 360.0 | 28800 | 1.3904 | 0.4937 | 0.1887 |
| 0.373 | 375.0 | 30000 | 1.4971 | 0.4946 | 0.1902 |
| 0.3657 | 390.0 | 31200 | 1.4208 | 0.4900 | 0.1821 |
| 0.3559 | 405.0 | 32400 | 1.4648 | 0.4895 | 0.1835 |
| 0.3476 | 420.0 | 33600 | 1.4848 | 0.4946 | 0.1829 |
| 0.3276 | 435.0 | 34800 | 1.5597 | 0.4979 | 0.1873 |
| 0.3193 | 450.0 | 36000 | 1.7329 | 0.5040 | 0.1980 |
| 0.3078 | 465.0 | 37200 | 1.6379 | 0.4937 | 0.1882 |
| 0.3058 | 480.0 | 38400 | 1.5878 | 0.4942 | 0.1921 |
| 0.2987 | 495.0 | 39600 | 1.5590 | 0.4811 | 0.1846 |
| 0.2931 | 510.0 | 40800 | 1.6001 | 0.4825 | 0.1849 |
| 0.276 | 525.0 | 42000 | 1.7388 | 0.4942 | 0.1918 |
| 0.2702 | 540.0 | 43200 | 1.7037 | 0.4839 | 0.1866 |
| 0.2619 | 555.0 | 44400 | 1.6704 | 0.4755 | 0.1840 |
| 0.262 | 570.0 | 45600 | 1.6042 | 0.4751 | 0.1865 |
| 0.2528 | 585.0 | 46800 | 1.6402 | 0.4821 | 0.1865 |
| 0.2442 | 600.0 | 48000 | 1.6693 | 0.4886 | 0.1862 |
| 0.244 | 615.0 | 49200 | 1.6203 | 0.4765 | 0.1792 |
| 0.2388 | 630.0 | 50400 | 1.6829 | 0.4830 | 0.1828 |
| 0.2362 | 645.0 | 51600 | 1.8100 | 0.4928 | 0.1888 |
| 0.2224 | 660.0 | 52800 | 1.7746 | 0.4932 | 0.1899 |
| 0.2218 | 675.0 | 54000 | 1.7752 | 0.4946 | 0.1901 |
| 0.2201 | 690.0 | 55200 | 1.6775 | 0.4788 | 0.1844 |
| 0.2147 | 705.0 | 56400 | 1.7085 | 0.4844 | 0.1851 |
| 0.2103 | 720.0 | 57600 | 1.7624 | 0.4848 | 0.1864 |
| 0.2101 | 735.0 | 58800 | 1.7213 | 0.4783 | 0.1835 |
| 0.1983 | 750.0 | 60000 | 1.7452 | 0.4848 | 0.1856 |
| 0.2015 | 765.0 | 61200 | 1.7525 | 0.4872 | 0.1869 |
| 0.1969 | 780.0 | 62400 | 1.7443 | 0.4844 | 0.1852 |
| 0.2043 | 795.0 | 63600 | 1.7302 | 0.4825 | 0.1847 |
### Framework versions
- Transformers 4.16.2
- Pytorch 1.10.1+cu102
- Datasets 1.18.3
- Tokenizers 0.11.0