Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents Paper • 2605.30621 • Published 16 days ago • 22
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published Apr 30 • 13
FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration Paper • 2603.29557 • Published Mar 31 • 17
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2, 2025 • 26
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Paper • 2507.13332 • Published Jul 17, 2025 • 49