Transcribe audio with emotions and events
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Replace objects in images with new content