Submitted by Weiyun1025 170 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models · 47 authors 4
Submitted by LIKirin 94 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters · 5 authors 4
Submitted by cuijiaxing 38 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability · 3 authors 1
Submitted by wenhu 36 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning · 6 authors 1
Submitted by starriver030515 34 FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding · 7 authors 2
Submitted by mponty 29 Iterative Self-Training for Code Generation via Reinforced Re-Ranking · 3 authors 1
Submitted by DogNeverSleep 26 Mavors: Multi-granularity Video Representation for Multimodal Large Language Model · 15 authors 1
Submitted by xhluca 18 AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories · 10 authors 1
Submitted by AIRobotZ 16 S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models · 5 authors 2
Submitted by leoozy 14 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization · 7 authors 1
Submitted by ztwang 13 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training · 4 authors 1
Submitted by brucelyu 11 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users · 21 authors 2
Submitted by codezakh 11 Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems · 5 authors 1
Submitted by yyamada 9 The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search · 8 authors 1
Submitted by LibraTree 7 VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search · 8 authors 3
Submitted by parshinsh 6 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models · 6 authors 1
Submitted by akhaliq 4 M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models · 6 authors 1
Submitted by ChrisJuan 4 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety · 10 authors 2
Submitted by Rexhaif 2 DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? · 8 authors 1
Submitted by kpzhang996 2 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models · 20 authors 1
Submitted by mqliu 1 LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models · 11 authors 1
Submitted by SteveZeyuZhang - DiffuMural: Restoring Dunhuang Murals with Multi-scale Diffusion · 9 authors 1
Submitted by johnhalloran - MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits · 2 authors 1