bene-ges's picture
Update README.md
3fc0eda
metadata
license: cc-by-nc-4.0
language:
  - ru
library_name: nemo
tags:
  - tts
  - text-to-speech
  - Vocoder

How to use

See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this notebook. Or use this bash-script.

Input

This model accepts batches of mel spectrograms.

Output

This model outputs audio at 22050Hz.

Training

The NeMo toolkit [1] was used for training the model for several epochs. Full training script is here.

Datasets

This model is trained on RUSLAN [2] corpus (single speaker, male voice) sampled at 22050Hz.

References

  • [1] NVIDIA NeMo Toolkit
  • [2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham