File size: 932 Bytes
3d9b065
 
 
 
 
 
 
 
e46a3ef
3d9b065
a067ee1
0840183
 
56533db
6676761
 
c471d5a
 
e46a3ef
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
title: README
emoji: 🦀
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
---
## Model/Data associated with research project *Autonomous Evaluation and Refinement of Digital Agents*.

### [Paper](https://arxiv.org/abs/2404.06474) | [Code](https://github.com/Berkeley-NLP/Agent-Eval-Refine)


We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.


[Jiayi Pan](https://www.jiayipan.me/), [Yichi Zhang](https://sled.eecs.umich.edu/author/yichi-zhang/), [Nicholas Tomlin](https://people.eecs.berkeley.edu/~nicholas_tomlin/), [Yifei Zhou](https://yifeizhou02.github.io/), [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/), [Alane Suhr](https://www.alanesuhr.com/)

UC Berkeley, University of Michigan