--- license: cc-by-4.0 language: - ru library_name: nemo --- ### Input This model expects text converted to IPA-like transcriptions. See this [g2p model](https://huggingface.co/bene-ges/ru_g2p_ipa_bert_large) for conversion of plain Russian text to phonemes. If you feed plain text directly, it will work, but quality will be low. ### Output This model generates mel spectrograms. ## Training The NeMo toolkit [1] was used for training the model for 1000+ epochs. ### Datasets This model is trained on [RUSLAN](https://ruslan-corpus.github.io/) [2] corpus (single speaker, male voice) sampled at 22050Hz. ## References - [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) - [2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham