Model description

This repo contains the model and the notebook for implementing MelGAN to inverse spectrogram using feature matching MelGAN-based spectrogram inversion using feature matching.

Full credits go to Darshan Deshpande

Reproduced by Vu Minh Chien

Motivation: Autoregressive vocoders have been ubiquitous for the majority of the history of speech processing, but for most of their existence they have lacked parallelism. MelGAN is a non-autoregressive, fully convolutional vocoder architecture used for purposes ranging from spectral inversion and speech enhancement to present-day state-of-the-art speech synthesis when used as a decoder with models like Tacotron2 or FastSpeech that convert text to mel spectrograms.

LJSpeech dataset was used in this tutorial. The LJSpeech dataset is primarily used for text-to-speech and consists of 13,100 discrete speech samples taken from 7 non-fiction books, having a total length of approximately 24 hours

Intended uses & limitations

The MelGAN implemented in this tutorial is similar to the original implementation with only the difference in the method of padding for convolutions where we will use 'same' instead of reflecting padding.

Training hyperparameters

The following hyperparameters were used during training:

  • generator_learning_rate: 1e-5
  • discriminator_learning_rate: 1e-6
  • train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • num_epochs: 20

Model Plot

View Model Demo

Model Demo

View Model Plot

Model Image

Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Space using keras-io/MelGAN-spectrogram-inversion 1