Submitted by nebulae09 36 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM · 12 authors 1
Submitted by akhaliq 34 DAPO: An Open-Source LLM Reinforcement Learning System at Scale · 35 authors 1
Submitted by carboncoo 22 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding · 8 authors 1
Submitted by cckevinn 18 CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era · 10 authors 1
Submitted by ZhaoyangLyu 17 Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation · 12 authors 1
Submitted by akhaliq 9 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control · 39 authors 1
Submitted by kpzhang996 7 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification · 9 authors 1
Submitted by zhangysk 6 FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis · 9 authors 1
Submitted by Lingaaaaaaa 5 Temporal Consistency for LLM Reasoning Process Error Identification · 7 authors 1
Submitted by BestWishYsh 5 Concat-ID: Towards Universal Identity-Preserving Video Synthesis · 5 authors 1
Submitted by edaxberger 5 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs · 11 authors 1
Submitted by kpzhang996 4 PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models · 11 authors 1
Submitted by Spravil 4 Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models · 3 authors 1
Submitted by PengDa02 3 Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs · 9 authors 2
Submitted by jacklishufan 3 Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection · 7 authors 1
Submitted by yuwendu 2 RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation · 9 authors 1
Submitted by DamianBoborzi 1 MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling · 6 authors 1
Submitted by Mingtongz 1 KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation · 3 authors 1
Submitted by ZhiyuanZeng 1 EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees · 4 authors 1
Submitted by cxliu0314 - CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving · 5 authors 1