Generate images with virtual try-on or pose transfer
Engage in multi-modal conversations with images and videos
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Interact with images and texts using Qwen-VL-Max