CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper β’ 2503.09662 β’ Published 4 days ago β’ 28
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper β’ 2503.05978 β’ Published 9 days ago β’ 32
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Paper β’ 2502.18302 β’ Published 19 days ago β’ 4
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper β’ 2502.17157 β’ Published 20 days ago β’ 51
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper β’ 2502.17258 β’ Published 20 days ago β’ 73
Dynamic Concepts Personalization from Single Videos Paper β’ 2502.14844 β’ Published 24 days ago β’ 16
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published 24 days ago β’ 129
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper β’ 2502.09621 β’ Published about 1 month ago β’ 27
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper β’ 2502.07701 β’ Published Feb 11 β’ 34
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper β’ 2502.04320 β’ Published Feb 6 β’ 35
Generating Multi-Image Synthetic Data for Text-to-Image Customization Paper β’ 2502.01720 β’ Published Feb 3 β’ 8
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper β’ 2502.02492 β’ Published Feb 4 β’ 62
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper β’ 2501.16411 β’ Published Jan 27 β’ 18
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper β’ 2501.13926 β’ Published Jan 23 β’ 37
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper β’ 2501.04698 β’ Published Jan 8 β’ 14
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing Paper β’ 2412.04280 β’ Published Dec 5, 2024 β’ 14