base model for mono-channel completion
Co-Speech Gesture Video Generation
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Stable Diffusion Finetuned Version