README.md · Agent-Eval-Refine/README at main

metadata

title: README
emoji: 🦀
colorFrom: blue
colorTo: blue
sdk: static
pinned: false

Model/Data associated with research project Autonomous Evaluation and Refinement of Digital Agents.

Paper | Code

We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.

Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

UC Berkeley, University of Michigan