bene-ges
/

tts_ru_hifigan_ruslan

Model card Files Files and versions Community

tts_ru_hifigan_ruslan / README.md

bene-ges's picture

Update README.md

3fc0eda over 1 year ago

|

history blame contribute delete

1.26 kB

	---
	license: cc-by-nc-4.0
	language:
	- ru
	library_name: nemo
	tags:
	- tts
	- text-to-speech
	- Vocoder
	---

	### How to use

	See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this [notebook](https://github.com/bene-ges/nemo_compatible/blob/main/notebooks/Russian_TTS_with_IPA_G2P_FastPitch_and_HifiGAN.ipynb).
	Or use this [bash-script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/test.sh).

	### Input

	This model accepts batches of mel spectrograms.

	### Output

	This model outputs audio at 22050Hz.

	## Training

	The NeMo toolkit [1] was used for training the model for several epochs.
	Full training script is [here](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/train.sh).

	### Datasets

	This model is trained on [RUSLAN](https://ruslan-corpus.github.io/) [2] corpus (single speaker, male voice) sampled at 22050Hz.

	## References
	- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
	- [2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham