File size: 995 Bytes
b164ceb
 
 
2b9c352
b164ceb
 
 
2b9c352
b164ceb
05a0076
 
 
b164ceb
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
### The MelGAN vocoder for StyleSpeech
#### About StyleSpeech
* StyleSpeech or Meta-StyleSpeech is a model for Multi-Speaker Adaptive Text-to-Speech Generation
* The StyleSpeech model can be trained by official implementation (https://github.com/KevinMIN95/StyleSpeech).
#### About MelGAN vocoder
* This MelGAN vocoder is used to transform the mel-spectrogram back to the waveform. 
* StyleSpeech is based on 16k Hz sampling rate, and there is no available 16k Hz multi-speaker vocoder.
* Thus I train this vocoder from scratch using Libri-TTS train-100 hour dataset. The training pipeline is the same as the official MelGAN (https://github.com/descriptinc/melgan-neurips).
* The synthesized sounds are close to the official demo with good quality.
#### Usage
* Please follow the official MelGAN (https://github.com/descriptinc/melgan-neurips) to load pre-trained checkpoint and convert your mel-spectrogram back to the waveform.

#### Training Details 
* GPU: RTX 2080Ti
* Training epoch: 3000