File size: 1,263 Bytes
92386f0
fef3d47
fc19722
 
 
 
 
 
 
92386f0
fc19722
3be4fbb
 
59023a9
3fc0eda
3be4fbb
fc19722
 
 
 
 
 
 
 
 
 
 
3fc0eda
fc19722
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: cc-by-nc-4.0
language:
- ru
library_name: nemo
tags:
- tts
- text-to-speech
- Vocoder
---

### How to use

See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this [notebook](https://github.com/bene-ges/nemo_compatible/blob/main/notebooks/Russian_TTS_with_IPA_G2P_FastPitch_and_HifiGAN.ipynb).
Or use this [bash-script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/test.sh).

### Input

This model accepts batches of mel spectrograms.

### Output

This model outputs audio at 22050Hz.

## Training

The NeMo toolkit [1] was used for training the model for several epochs.
Full training script is [here](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/train.sh).

### Datasets

This model is trained on [RUSLAN](https://ruslan-corpus.github.io/) [2] corpus (single speaker, male voice) sampled at 22050Hz.

## References
- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
- [2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham