callanwu's picture
add deep search benchmark
ad07983
raw
history blame
2.11 kB
TITLE = '<h1 align="center" id="space-title">πŸ† WebWalkerQA Leaderboard</h1>'
INTRO_TEXT = f"""
## πŸ“– About
This leaderboard showcases the performance of models on the **WebWalkerQA benchmark**. WebWalkerQA is a collection of question-answering datasets designed to test models' ability to answer questions about web pages.
"""
HOW_TO = f"""
## πŸ—‚οΈ Data
The WebWalkerQA dataset is available on πŸ€— [Hugging Face](https://huggingface.co/datasets/callanwu/WebWalkerQA). It comprises **680 question-answer pairs**, each linked to a corresponding web page. The benchmark is divided into two key components:
- **Agent πŸ€–οΈ**
- **RAG-system πŸ”**
## πŸš€ How to Submit Your Method
### πŸ“ Submission Steps:
To list your method's performance on this leaderboard, email **jialongwu@alibaba-inc.com** or **jialongwu@seu.edu.cn** with the following:
1. A JSONL file in the format:
```jsonl
{{"question": "question_text", "prediction": "predicted_answer_text"}}
```
2. Include the following details in your email:
- **User Name**
- **Type** (RAG-system or Agent)
- **Method Name**
Your method will be evaluated and added to the leaderboard. For reference, check out the [evaluation code](https://github.com/Alibaba-NLP/WebWalker/src/evaluate.py).
We will evaluate the performance of your method and list it on the leaderboard.
For reference, you can check the [evaluation code](https://github.com/Alibaba-NLP/WebWalker/src/evaluate.py).
"""
CREDIT = f"""
## πŸ™Œ Credit
This website is built using the following resources:
- **Evaluation Code**: Langchain's cot_qa evaluator
- **Leaderboard Code**: Huggingface4's open_llm_leaderboard
"""
CITATION = f"""
## 🚩Citation
If this work is helpful, please kindly cite as:
```bigquery
@article{{wu2025webwalker,
title={{Webwalker: Benchmarking llms in web traversal}},
author={{Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and others}},
journal={{arXiv preprint arXiv:2501.07572}},
year={{2025}}
}}
```
"""