Update README.md
Browse files
README.md
CHANGED
|
@@ -52,10 +52,10 @@ We evaluated Surfer-H on the [WebVoyager](https://arxiv.org/pdf/2401.13919) benc
|
|
| 52 |
</div>
|
| 53 |
|
| 54 |
We’ve tested multiple configurations, from GPT-4-powered agents to 100% open Holo1 setups. Among them, the fully Holo1-based agents offered the strongest tradeoff between accuracy and cost:
|
| 55 |
-
- Surfer-H + Holo1-7B: 92.2% accuracy at
|
| 56 |
-
- Surfer-H + GPT-
|
| 57 |
-
- Surfer-H +
|
| 58 |
-
- Surfer-H +
|
| 59 |
|
| 60 |
This places Holo1-powered agents on the Pareto frontier, delivering the best accuracy per dollar.
|
| 61 |
Unlike other agents that rely on custom APIs or brittle wrappers, Surfer-H operates purely through the browser — just like a real user. Combined with Holo1, it becomes a powerful, general-purpose, cost-efficient web automation system.
|
|
|
|
| 52 |
</div>
|
| 53 |
|
| 54 |
We’ve tested multiple configurations, from GPT-4-powered agents to 100% open Holo1 setups. Among them, the fully Holo1-based agents offered the strongest tradeoff between accuracy and cost:
|
| 55 |
+
- Surfer-H + Holo1-7B: 92.2% accuracy at $0.13 per task
|
| 56 |
+
- Surfer-H + GPT-4.1: 92.0% at $0.54 per task
|
| 57 |
+
- Surfer-H + Holo1-3B: 89.7% at $0.11 per task
|
| 58 |
+
- Surfer-H + GPT-4.1-mini: 88.8% at $0.26 per task
|
| 59 |
|
| 60 |
This places Holo1-powered agents on the Pareto frontier, delivering the best accuracy per dollar.
|
| 61 |
Unlike other agents that rely on custom APIs or brittle wrappers, Surfer-H operates purely through the browser — just like a real user. Combined with Holo1, it becomes a powerful, general-purpose, cost-efficient web automation system.
|