gregmialz commited on
Commit
2937740
1 Parent(s): 58e4674

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +1 -1
content.py CHANGED
@@ -10,7 +10,7 @@ GAIA is made of 3 evaluation levels, depending on the added level of tooling and
10
  We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
11
  Each of these levels is divided into two sets: a fully public dev set, on which people can test their models, and a test set with private answers and metadata. Results can be submitted for both validation and test.
12
 
13
- We expect submissions to be json-line files with the following format:
14
  ```
15
  {"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"}
16
  {"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}
 
10
  We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
11
  Each of these levels is divided into two sets: a fully public dev set, on which people can test their models, and a test set with private answers and metadata. Results can be submitted for both validation and test.
12
 
13
+ We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
14
  ```
15
  {"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"}
16
  {"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}