27 Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video · 4 authors 1
20 Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model · 4 authors
10 TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models · 6 authors
10 Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization · 6 authors