hsaest commited on
Commit
60276d5
1 Parent(s): b82f311

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +9 -7
content.py CHANGED
@@ -1,16 +1,16 @@
1
  TITLE = """<h1 align="center" id="space-title">TravelPlanner Leaderboard</h1>"""
2
 
3
  INTRODUCTION_TEXT = """
4
- TravelPlanner is a benchmark crafted for evaluating language agents in tool-use and complex planning within multiple constraints. (See our [paper](https://arxiv.org/pdf/2402.01622.pdf) for more details.)
5
 
6
  ## Data
7
- In TravelPlanner, for a given query, language agents are expected to formulate a comprehensive plan that includes transportation, daily meals, attractions, and accommodation for each day.
8
  For constraints, from the perspective of real world applications, we design three types of them: Environment Constraint, Commonsense Constraint, and Hard Constraint.
9
- TravelPlanner comprises 1,225 queries in total. The number of days and hard constraints are designed to test agents' abilities across both the breadth and depth of complex planning.
10
 
11
- TravelPlanner data can be found in [this dataset](https://huggingface.co/datasets/osunlp/TravelPlanner).
12
 
13
- ## Submission Guidelines for TravelPlanner
14
  Participants are invited to submit results for both validation and testing phases. The submissions will be evaluated based on several metrics: delivery rate, commonsense constraint pass rate (micro/macro), hard constraint pass rate (micro/macro), and the final pass rate.
15
 
16
  ### Format of Submission:
@@ -39,10 +39,12 @@ Format: Use "Name, City" to specify the chosen restaurant and its location. If a
39
  Description: Information about attractions visited.
40
  Format: List attractions as "Name, City". If visiting multiple attractions, separate them with a semicolon ";". If no attraction is planned, use "-".
41
 
42
- Please refer to [this](https://huggingface.co/datasets/osunlp/TravelPlanner/resolve/main/example_submission.jsonl?download=true) for example submission file.
43
 
44
- Submission made by our team are labelled "TravelPlanner Team". Each submission will be automatically evaluated and scored based on the predefined metrics. The scores and rankings will be updated and displayed on the leaderboard.
45
 
 
 
46
  """
47
 
48
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 
1
  TITLE = """<h1 align="center" id="space-title">TravelPlanner Leaderboard</h1>"""
2
 
3
  INTRODUCTION_TEXT = """
4
+ TravelBench is a benchmark crafted for evaluating language agents in tool-use and complex planning within multiple constraints. (See our [paper](https://arxiv.org/abs/2311.12983) for more details.)
5
 
6
  ## Data
7
+ In TravelBench, for a given query, language agents are expected to formulate a comprehensive plan that includes transportation, daily meals, attractions, and accommodation for each day.
8
  For constraints, from the perspective of real world applications, we design three types of them: Environment Constraint, Commonsense Constraint, and Hard Constraint.
9
+ TravelBench comprises 1,225 queries in total. The number of days and hard constraints are designed to test agents' abilities across both the breadth and depth of complex planning.
10
 
11
+ TravelBench data can be found in [this dataset](https://huggingface.co/datasets/osunlp/TravelBench).
12
 
13
+ ## Submission Guidelines for TravelBench
14
  Participants are invited to submit results for both validation and testing phases. The submissions will be evaluated based on several metrics: delivery rate, commonsense constraint pass rate (micro/macro), hard constraint pass rate (micro/macro), and the final pass rate.
15
 
16
  ### Format of Submission:
 
39
  Description: Information about attractions visited.
40
  Format: List attractions as "Name, City". If visiting multiple attractions, separate them with a semicolon ";". If no attraction is planned, use "-".
41
 
42
+ Please refer to [this](https://huggingface.co/datasets/osunlp/TravelBench/resolve/main/example_submission.jsonl?download=true) for example submission file.
43
 
44
+ Submission made by our team are labelled "TravelBench Team". Each submission will be automatically evaluated and scored based on the predefined metrics. You can then obtain the scores and download the detailed constraint pass rates after the evaluation.
45
 
46
+ ## Show Your Results on Leaderborad
47
+ If you are interested in featuring your results on our leaderboard, we invite you to reach out to us. Please send an email to [us](mailto:jianx0321@gmail.com) including the following details: evaluation mode, fondation model, tool-use strategy, planning strategy, organization, and your paper link (if available), along with your submission files.
48
  """
49
 
50
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"