26 Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold · 6 authors 73
2 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training · 8 authors 4
2 UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild · 13 authors 1
1 SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities · 7 authors 1
1 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks · 11 authors 5
1 GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework · 7 authors 1