SberDevices
/

quartznet-russian

Model card Files Files and versions Community

quartznet-russian / README.md

karpnv

Librarian Bot: Update dataset YAML metadata for model (#1)

31df128 over 1 year ago

preview code

raw history blame contribute delete

No virus

2.59 kB

	---
	datasets: SberDevices/Golos
	---
	# Acoustic and language models

	Acoustic model built using [QuartzNet15x5](https://arxiv.org/pdf/1910.10261.pdf) architecture and trained using [NeMo toolkit](https://github.com/NVIDIA/NeMo/tree/r1.0.0b4)


	Three n-gram language models created using [KenLM Language Model Toolkit](https://kheafield.com/code/kenlm)

	* LM built on [Common Crawl](https://commoncrawl.org) Russian dataset
	* LM built on [Golos](https://huggingface.co/datasets/SberDevices/Golos) train set
	* LM built on [Common Crawl](https://commoncrawl.org) and [Golos](https://huggingface.co/datasets/SberDevices/Golos) datasets together (50/50)

	\| Archives \| Size \| Links \|
	\|--------------------------\|------------\|-----------------\|
	\| QuartzNet15x5_golos.nemo \| 68 MB \| https://sc.link/ZMv \|
	\| KenLMs.tar \| 4.8 GB \| https://sc.link/YL0 \|


	Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in [ML Space](https://sbercloud.ru/ru/aicloud/mlspace) - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.


	## Evaluation

	Percents of Word Error Rate for different test sets


	\| Decoder \ Test set \| Crowd test \| Farfield test \| MCV<sup>1</sup> dev \| MCV<sup>1</sup> test \|
	\|-------------------------------------\|-----------\|----------\|-----------\|----------\|
	\| Greedy decoder \| 4.389 % \| 14.949 % \| 9.314 % \| 11.278 % \|
	\| Beam Search with Common Crawl LM \| 4.709 % \| 12.503 % \| 6.341 % \| 7.976 % \|
	\| Beam Search with Golos train set LM \| 3.548 % \| 12.384 % \| - \| - \|
	\| Beam Search with Common Crawl and Golos LM \| 3.318 % \| 11.488 % \| 6.4 % \| 8.06 % \|


	<sup>1</sup> [Common Voice](https://commonvoice.mozilla.org) - Mozilla's initiative to help teach machines how real people speak.

	## Resources

	[[arxiv.org] Golos: Russian Dataset for Speech Research](https://arxiv.org/abs/2106.10161)

	[[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе](https://habr.com/ru/company/sberdevices/blog/559496/)

	[[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных](https://habr.com/ru/company/sberdevices/blog/569082/)