Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 11 days ago • 48
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry Paper • 2606.14249 • Published 16 days ago • 49
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time? Paper • 2606.05553 • Published 24 days ago • 50
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining Paper • 2606.17200 • Published 13 days ago • 51
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 21 days ago • 51
SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning Paper • 2606.10804 • Published 19 days ago • 51
MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction Paper • 2606.18558 • Published 11 days ago • 53
Benchmarking Visual State Tracking in Multimodal Video Understanding Paper • 2606.03920 • Published 26 days ago • 52
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Paper • 2606.09730 • Published 20 days ago • 54
From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Paper • 2605.23895 • Published May 22 • 54
TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders Paper • 2606.09323 • Published 20 days ago • 53
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations Paper • 2606.05563 • Published 24 days ago • 55
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Paper • 2606.13578 • Published 17 days ago • 56
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 27 days ago • 57
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 12 days ago • 58
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses Paper • 2606.02373 • Published 27 days ago • 59