update test data store
#6
by
yulongchen
- opened
- README.md +0 -1
- data_store/knowledge_store/test_updated/output_test_0_499.zip +0 -3
- data_store/knowledge_store/test_updated/output_test_1000_1499.zip +0 -3
- data_store/knowledge_store/test_updated/output_test_1500_1999.zip +0 -3
- data_store/knowledge_store/test_updated/output_test_2000_2214.zip +0 -3
- data_store/knowledge_store/test_updated/output_test_500_999.zip +0 -3
- data_store/knowledge_store/test_updated/readme.md +0 -1
- src/reranking/bm25_sentences.py +1 -1
README.md
CHANGED
@@ -8,7 +8,6 @@ Data, knowledge store and source code to reproduce the baseline experiments for
|
|
8 |
|
9 |
|
10 |
## NEWS:
|
11 |
-
- 15.11.2024: We update the knowledge store for the test set, solving the potential data leaking problem. The updated the KS is released [here](https://huggingface.co/chenxwh/AVeriTeC/tree/main/data_store/knowledge_store/test_updated).
|
12 |
- 20.07.2024: The knowledge store for the test set is released [here](https://huggingface.co/chenxwh/AVeriTeC/tree/main/data_store/knowledge_store/test)!
|
13 |
- 18.07.2024: The test data is released [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data/test.json)! Note that the first 1000 data points are from the same source of Averitec (`claim_id` from 0 to 999) and the rest 1215 data points (`claim_id` from 1000 to 2214) are newly constructed.
|
14 |
- 15.07.2024: To facilitate human evaluation, we now ask the submission files to include a `scraped_text` field, have a look in [here](https://huggingface.co/chenxwh/AVeriTeC#format-for-submission-files) for more information!
|
|
|
8 |
|
9 |
|
10 |
## NEWS:
|
|
|
11 |
- 20.07.2024: The knowledge store for the test set is released [here](https://huggingface.co/chenxwh/AVeriTeC/tree/main/data_store/knowledge_store/test)!
|
12 |
- 18.07.2024: The test data is released [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data/test.json)! Note that the first 1000 data points are from the same source of Averitec (`claim_id` from 0 to 999) and the rest 1215 data points (`claim_id` from 1000 to 2214) are newly constructed.
|
13 |
- 15.07.2024: To facilitate human evaluation, we now ask the submission files to include a `scraped_text` field, have a look in [here](https://huggingface.co/chenxwh/AVeriTeC#format-for-submission-files) for more information!
|
data_store/knowledge_store/test_updated/output_test_0_499.zip
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:db5481aaec3307e483f5e15c3ad604a77089f2fe64c595b077159b69561b6842
|
3 |
-
size 2740454196
|
|
|
|
|
|
|
|
data_store/knowledge_store/test_updated/output_test_1000_1499.zip
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:e5427ad6bb6490f7889084ab8e4f7a54f436c76074b8971c6d405da444c804bb
|
3 |
-
size 2762519869
|
|
|
|
|
|
|
|
data_store/knowledge_store/test_updated/output_test_1500_1999.zip
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:1b47e9f42cfcc0363bb558bbc9801f7345c985952aa4efee592196c01cbe3fd2
|
3 |
-
size 2819863865
|
|
|
|
|
|
|
|
data_store/knowledge_store/test_updated/output_test_2000_2214.zip
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:75033f2cf745ff1e21e955772bc400bc1eff60d19fa0f4f7f48f50e95d45962f
|
3 |
-
size 1481402640
|
|
|
|
|
|
|
|
data_store/knowledge_store/test_updated/output_test_500_999.zip
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:bb866db83e5b3139e2bfb30ee33614976ddd0e35902a443ea6950d3e0fb87a7b
|
3 |
-
size 2756798387
|
|
|
|
|
|
|
|
data_store/knowledge_store/test_updated/readme.md
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
We have updated the evidence for each claim. In particular, in the previous version, there could be the fact-checking article for each claim. Thus, we remove such evidences, re-scrape the text from URLs, and construct this new KS.
|
|
|
|
src/reranking/bm25_sentences.py
CHANGED
@@ -115,4 +115,4 @@ if __name__ == "__main__":
|
|
115 |
}
|
116 |
output_json.write(json.dumps(json_data, ensure_ascii=False) + "\n")
|
117 |
done += 1
|
118 |
-
|
|
|
115 |
}
|
116 |
output_json.write(json.dumps(json_data, ensure_ascii=False) + "\n")
|
117 |
done += 1
|
118 |
+
output_file.flush()
|