Yehor
/

wav2vec2-xls-r-1b-uk-with-lm

Automatic Speech Recognition

Generated from Trainer

hf-asr-leaderboard

mozilla-foundation/common_voice_7_0

robust-speech-event

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Edit model card

Ukrainian STT model (with Language Model)

🇺🇦 Join Ukrainian Speech Recognition Community - https://t.me/speech_recognition_uk

⭐ See other Ukrainian models - https://github.com/egorsmkv/speech-recognition-uk

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 - UK dataset.

It achieves the following results on the evaluation set without the language model:

Loss: 0.1875
Wer: 0.2033
Cer: 0.0384

Model description

On 100 test example the model shows the following results:

Without LM:

WER: 0.1862
CER: 0.0277

With LM:

WER: 0.1218
CER: 0.0190

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 20
total_train_batch_size: 160
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
1.2815	7.93	500	0.3536	0.4753	0.1009
1.0869	15.86	1000	0.2317	0.3111	0.0614
0.9984	23.8	1500	0.2022	0.2676	0.0521
0.975	31.74	2000	0.1948	0.2469	0.0487
0.9306	39.67	2500	0.1916	0.2377	0.0464
0.8868	47.61	3000	0.1903	0.2257	0.0439
0.8424	55.55	3500	0.1786	0.2206	0.0423
0.8126	63.49	4000	0.1849	0.2160	0.0416
0.7901	71.42	4500	0.1869	0.2138	0.0413
0.7671	79.36	5000	0.1855	0.2075	0.0394
0.7467	87.3	5500	0.1884	0.2049	0.0389
0.731	95.24	6000	0.1877	0.2060	0.0387

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.18.1.dev0
Tokenizers 0.11.0

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_7_0 with split test

python eval.py --model_id Yehor/wav2vec2-xls-r-1b-uk-with-lm --dataset mozilla-foundation/common_voice_7_0 --config uk --split test

Eval results on Common Voice 7 "test" (WER):

Without LM	With LM (run `./eval.py`)
21.52	14.62

Downloads last month: 9

Automatic Speech Recognition

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Yehor/wav2vec2-xls-r-1b-uk-with-lm

Evaluation results

Test WER on Common Voice 7
self-reported

14.620
Test WER on Robust Speech Event - Dev Data
self-reported

48.720
Test WER on Robust Speech Event - Test Data
self-reported

40.660

View on Papers With Code