Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 9 days ago • 35
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 11 days ago • 96