Acoustic and language models

Acoustic model built using QuartzNet15x5 architecture and trained using NeMo toolkit

Three n-gram language models created using KenLM Language Model Toolkit

LM built on Common Crawl Russian dataset
LM built on Golos train set
LM built on Common Crawl and Golos datasets together (50/50)

Archives	Size	Links
QuartzNet15x5_golos.nemo	68 MB	https://sc.link/ZMv
KenLMs.tar	4.8 GB	https://sc.link/YL0

Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in ML Space - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.

Evaluation

Percents of Word Error Rate for different test sets

Decoder \ Test set	Crowd test	Farfield test	MCV¹ dev	MCV¹ test
Greedy decoder	4.389 %	14.949 %	9.314 %	11.278 %
Beam Search with Common Crawl LM	4.709 %	12.503 %	6.341 %	7.976 %
Beam Search with Golos train set LM	3.548 %	12.384 %	-	-
Beam Search with Common Crawl and Golos LM	3.318 %	11.488 %	6.4 %	8.06 %

¹ Common Voice - Mozilla's initiative to help teach machines how real people speak.

Resources

[arxiv.org] Golos: Russian Dataset for Speech Research

[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе

[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных

SberDevices
/

quartznet-russian

Acoustic and language models

Evaluation

Resources

Dataset used to train SberDevices/quartznet-russian