Amphion Vocoder Recipe

Quick Start

We provide a beginner recipe to demonstrate how to train a high quality HiFi-GAN speech vocoder. Specially, it is also an official implementation of our paper "Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder". Some demos can be seen here.

Supported Models

Neural vocoder generates audible waveforms from acoustic representations, which is one of the key parts for current audio generation systems. Until now, Amphion has supported various widely-used vocoders according to different vocoder types, including:

GAN-based vocoders, which we have provided a unified recipe :
- MelGAN
- HiFi-GAN
- NSF-HiFiGAN
- BigVGAN
- APNet
Flow-based vocoders (👨‍💻 developing):
- WaveGlow
Diffusion-based vocoders, which we have provided a unified recipe:
- Diffwave
Auto-regressive based vocoders (👨‍💻 developing):
- WaveNet
- WaveRNN