NaturalSpeech2 / egs /tta /README.md
yuancwang
add app
9893813
|
raw
history blame contribute delete
No virus
816 Bytes

A newer version of the Gradio SDK is available: 4.38.1

Upgrade

Amphion Text-to-Audio (TTA) Recipe

Quick Start

We provide a beginner recipe to demonstrate how to train a cutting edge TTA model. Specifically, it is designed as a latent diffusion model like AudioLDM, Make-an-Audio, and AUDIT.

Supported Model Architectures

Until now, Amphion has supported a latent diffusion based text-to-audio model:



Similar to AUDIT, we implement it in two-stage training:

  1. Training the VAE which is called AutoencoderKL in Amphion.
  2. Training the conditional latent diffusion model which is called AudioLDM in Amphion.