V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper โข 2504.06148 โข Published 10 days ago โข 12
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper โข 2503.20198 โข Published 24 days ago โข 4
Automated Movie Generation via Multi-Agent CoT Planning Paper โข 2503.07314 โข Published Mar 10 โข 43
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles Paper โข 2503.03651 โข Published Mar 5 โข 16
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper โข 2503.01774 โข Published Mar 3 โข 43
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper โข 2502.14397 โข Published Feb 20 โข 41
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper โข 2502.08047 โข Published Feb 12 โข 27
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper โข 2502.07870 โข Published Feb 11 โข 44