Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ pinned: false
|
|
8 |
---
|
9 |
## Model/Data associated with research project *Autonomous Evaluation and Refinement of Digital Agents*.
|
10 |
|
11 |
-
### [Paper
|
12 |
|
13 |
|
14 |
We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.
|
|
|
8 |
---
|
9 |
## Model/Data associated with research project *Autonomous Evaluation and Refinement of Digital Agents*.
|
10 |
|
11 |
+
### [Paper](https://arxiv.org/abs/2404.06474) | [Code](https://github.com/Berkeley-NLP/Agent-Eval-Refine)
|
12 |
|
13 |
|
14 |
We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.
|