WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 8 days ago • 99
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Paper • 2605.26029 • Published 19 days ago • 18
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning Paper • 2605.22138 • Published 26 days ago • 11