SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published Sep 11, 2024 • 8
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published Sep 11, 2024 • 8
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners Paper • 2204.07689 • Published Apr 16, 2022
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs Paper • 2311.04892 • Published Nov 8, 2023 • 1
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models Paper • 2404.18400 • Published Apr 29, 2024 • 1
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 33
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 33