LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 14 days ago • 208
Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation Paper • 2606.13657 • Published 19 days ago • 5
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 22 days ago • 104
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published May 27 • 431