Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 34 • 6
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 34 • 6
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper • 2401.04575 • Published Jan 9, 2024 • 17 • 4
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 32 • 3
Guiding a Diffusion Model with a Bad Version of Itself Paper • 2406.02507 • Published Jun 4, 2024 • 17 • 1
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper • 2406.02523 • Published Jun 4, 2024 • 12 • 1
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published Jun 4, 2024 • 11 • 2
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published Jun 4, 2024 • 18 • 3
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 34 • 2
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5, 2024 • 11 • 1
Item-Language Model for Conversational Recommendation Paper • 2406.02844 • Published Jun 5, 2024 • 12 • 1
Searching Priors Makes Text-to-Video Synthesis Better Paper • 2406.03215 • Published Jun 5, 2024 • 14 • 2