FitDiT is a high-fidelity virtual try-on model.
Generate images with virtual try-on or pose transfer
Zero Shot voice cloning with llasa 3b (Unofficial Demo)
Generate realistic voice audio from text and audio prompts
Generate speech from text
Upgraded to v1.0!
An end-to-end (e2e) Voice Language Model by Fish Audio.
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)