README.md · bene-ges/tts_ru_ipa_fastpitch_ruslan at 32b5aeb8894e1d1c046b58a485c0771c2a07f61c

metadata

license: cc-by-4.0
language:
  - ru
library_name: nemo

Input

This model expects text converted to IPA-like transcriptions. See this g2p model for conversion of plain Russian text to phonemes. If you feed plain text directly, it will work, but quality will be low.

Output

This model generates mel spectrograms.

Training

The NeMo toolkit [1] was used for training the model for 1000+ epochs.

Datasets

This model is trained on RUSLAN [2] corpus (single speaker, male voice) sampled at 22050Hz.

References

[1] NVIDIA NeMo Toolkit
[2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham