SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published Sep 11 • 6
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 32