|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- ru |
|
library_name: nemo |
|
tags: |
|
- tts |
|
- text-to-speech |
|
- Vocoder |
|
--- |
|
|
|
### How to use |
|
|
|
See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this [notebook](https://github.com/bene-ges/nemo_compatible/blob/main/notebooks/Russian_TTS_with_IPA_G2P_FastPitch_and_HifiGAN.ipynb). |
|
Or use this [bash-script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/test.sh). |
|
|
|
### Input |
|
|
|
This model accepts batches of mel spectrograms. |
|
|
|
### Output |
|
|
|
This model outputs audio at 22050Hz. |
|
|
|
## Training |
|
|
|
The NeMo toolkit [1] was used for training the model for several epochs. |
|
Full training script is [here](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/train.sh). |
|
|
|
### Datasets |
|
|
|
This model is trained on [RUSLAN](https://ruslan-corpus.github.io/) [2] corpus (single speaker, male voice) sampled at 22050Hz. |
|
|
|
## References |
|
- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) |
|
- [2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham |