WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 8 days ago • 99
InterleaveThinker: Reinforcing Agentic Interleaved Generation Paper • 2606.13679 • Published 5 days ago • 77
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 6 days ago • 110
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 19 days ago • 145
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 15 days ago • 228
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization Paper • 2605.31455 • Published 18 days ago • 6
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents Paper • 2605.22608 • Published 26 days ago • 8