Spaces:
Running
Running
metadata
title: README
emoji: 🦀
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
Model/Data associated with research project Autonomous Evaluation and Refinement of Digital Agents.
Paper | Code
We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.
Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
UC Berkeley, University of Michigan