Agent-Eval-Refine

AI & ML interests

None defined yet.

Recent Activity

Jiayi-Pan authored a paper 5 months ago

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Jiayi-Pan authored a paper 6 months ago

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

594zyc updated a dataset 9 months ago

Agent-Eval-Refine/Agent-Trajectories

View all activity

Organization Card

Community About org cards

Model/Data associated with research project Autonomous Evaluation and Refinement of Digital Agents.

Paper | Code

We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.

Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

UC Berkeley, University of Michigan

spaces 1

Captioner Demo

models 3

Agent-Eval-Refine/Captioner

Text Generation • Updated Apr 8 • 357 • 2

Agent-Eval-Refine/CogAgent-iOS-SelfTrain

Agent-Eval-Refine/CogAgent-iOS-FilteredBC

datasets 2

Agent-Eval-Refine/Agent-Trajectories

Updated Apr 12 • 47 • 2

Agent-Eval-Refine/GUI-Dense-Descriptions

Viewer • Updated Apr 2 • 1.26k • 51 • 2