Yehor's picture
Update README.md
548345e
metadata
language:
  - uk
license: apache-2.0
tags:
  - automatic-speech-recognition
  - mozilla-foundation/common_voice_7_0
  - generated_from_trainer
  - uk
  - robust-speech-event
datasets:
  - common_voice
model-index:
  - name: wav2vec2-xls-r-1b-uk-with-lm
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice uk
          type: common_voice
          args: uk
        metrics:
          - name: Test WER
            type: wer
            value: 20.33

Ukrainian STT model (with Language Model)

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 - UK dataset.

It achieves the following results on the evaluation set without the language model:

  • Loss: 0.1875
  • Wer: 0.2033
  • Cer: 0.0384

Follow our community in Telegram: https://t.me/speech_recognition_uk

Model description

On 100 test example the model shows the following results:

Without LM:

  • WER: 0.1862
  • CER: 0.0277

With LM:

  • WER: 0.1218
  • CER: 0.0190

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 20
  • total_train_batch_size: 160
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
1.2815 7.93 500 0.3536 0.4753 0.1009
1.0869 15.86 1000 0.2317 0.3111 0.0614
0.9984 23.8 1500 0.2022 0.2676 0.0521
0.975 31.74 2000 0.1948 0.2469 0.0487
0.9306 39.67 2500 0.1916 0.2377 0.0464
0.8868 47.61 3000 0.1903 0.2257 0.0439
0.8424 55.55 3500 0.1786 0.2206 0.0423
0.8126 63.49 4000 0.1849 0.2160 0.0416
0.7901 71.42 4500 0.1869 0.2138 0.0413
0.7671 79.36 5000 0.1855 0.2075 0.0394
0.7467 87.3 5500 0.1884 0.2049 0.0389
0.731 95.24 6000 0.1877 0.2060 0.0387

Framework versions

  • Transformers 4.16.0.dev0
  • Pytorch 1.10.1+cu102
  • Datasets 1.18.1.dev0
  • Tokenizers 0.11.0

Eval results on Common Voice 7 "test" (WER):

Without LM With LM (run ./eval.py)
- 14.62