Spaces:
Runtime error
Runtime error
Amphion Text-to-Speech (TTS) Recipe
Quick Start
We provide a beginner recipe to demonstrate how to train a cutting edge TTS model. Specifically, it is Amphion's re-implementation for Vall-E, which is a zero-shot TTS architecture that uses a neural codec language model with discrete codes.
Supported Model Architectures
Until now, Amphion TTS supports the following models or architectures,
- FastSpeech2: A non-autoregressive TTS architecture that utilizes feed-forward Transformer blocks.
- VITS: An end-to-end TTS architecture that utilizes conditional variational autoencoder with adversarial learning
- Vall-E: A zero-shot TTS architecture that uses a neural codec language model with discrete codes.
- NaturalSpeech2 (👨💻 developing): An architecture for TTS that utilizes a latent diffusion model to generate natural-sounding voices.
Amphion TTS Demo
Here are some TTS samples from Amphion (👨💻 developing).