Model description
This repo contains the model and the notebook for implementing MelGAN to inverse spectrogram using feature matching MelGAN-based spectrogram inversion using feature matching.
Full credits go to Darshan Deshpande
Reproduced by Vu Minh Chien
Motivation: Autoregressive vocoders have been ubiquitous for the majority of the history of speech processing, but for most of their existence they have lacked parallelism. MelGAN is a non-autoregressive, fully convolutional vocoder architecture used for purposes ranging from spectral inversion and speech enhancement to present-day state-of-the-art speech synthesis when used as a decoder with models like Tacotron2 or FastSpeech that convert text to mel spectrograms.
LJSpeech dataset was used in this tutorial. The LJSpeech dataset is primarily used for text-to-speech and consists of 13,100 discrete speech samples taken from 7 non-fiction books, having a total length of approximately 24 hours
Intended uses & limitations
The MelGAN implemented in this tutorial is similar to the original implementation with only the difference in the method of padding for convolutions where we will use 'same' instead of reflecting padding.
Training hyperparameters
The following hyperparameters were used during training:
- generator_learning_rate: 1e-5
- discriminator_learning_rate: 1e-6
- train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 20
Model Plot
View Model Demo
- Downloads last month
- 16