Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published 12 days ago • 71
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published 17 days ago • 39
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published 20 days ago • 28
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published 27 days ago • 15
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published Feb 25 • 19
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 87
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 31
MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$uidance Paper • 2312.11396 • Published Dec 18, 2023 • 11
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance Paper • 2312.11396 • Published Dec 18, 2023 • 11