--- datasets: SberDevices/Golos --- # **Acoustic and language models** Acoustic model built using [QuartzNet15x5](https://arxiv.org/pdf/1910.10261.pdf) architecture and trained using [NeMo toolkit](https://github.com/NVIDIA/NeMo/tree/r1.0.0b4) Three n-gram language models created using [KenLM Language Model Toolkit](https://kheafield.com/code/kenlm) * LM built on [Common Crawl](https://commoncrawl.org) Russian dataset * LM built on [Golos](https://huggingface.co/datasets/SberDevices/Golos) train set * LM built on [Common Crawl](https://commoncrawl.org) and [Golos](https://huggingface.co/datasets/SberDevices/Golos) datasets together (50/50) | Archives | Size | Links | |--------------------------|------------|-----------------| | QuartzNet15x5_golos.nemo | 68 MB | https://sc.link/ZMv | | KenLMs.tar | 4.8 GB | https://sc.link/YL0 | Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in [ML Space](https://sbercloud.ru/ru/aicloud/mlspace) - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer. ## **Evaluation** Percents of Word Error Rate for different test sets | Decoder \ Test set | Crowd test | Farfield test | MCV1 dev | MCV1 test | |-------------------------------------|-----------|----------|-----------|----------| | Greedy decoder | 4.389 % | 14.949 % | 9.314 % | 11.278 % | | Beam Search with Common Crawl LM | 4.709 % | 12.503 % | 6.341 % | 7.976 % | | Beam Search with Golos train set LM | 3.548 % | 12.384 % | - | - | | Beam Search with Common Crawl and Golos LM | 3.318 % | 11.488 % | 6.4 % | 8.06 % | 1 [Common Voice](https://commonvoice.mozilla.org) - Mozilla's initiative to help teach machines how real people speak. ## **Resources** [[arxiv.org] Golos: Russian Dataset for Speech Research](https://arxiv.org/abs/2106.10161) [[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе](https://habr.com/ru/company/sberdevices/blog/559496/) [[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных](https://habr.com/ru/company/sberdevices/blog/569082/)