README.md CHANGED
@@ -8,7 +8,6 @@ Data, knowledge store and source code to reproduce the baseline experiments for
8
 
9
 
10
  ## NEWS:
11
- - 15.11.2024: We update the knowledge store for the test set, solving the potential data leaking problem. The updated the KS is released [here](https://huggingface.co/chenxwh/AVeriTeC/tree/main/data_store/knowledge_store/test_updated).
12
  - 20.07.2024: The knowledge store for the test set is released [here](https://huggingface.co/chenxwh/AVeriTeC/tree/main/data_store/knowledge_store/test)!
13
  - 18.07.2024: The test data is released [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data/test.json)! Note that the first 1000 data points are from the same source of Averitec (`claim_id` from 0 to 999) and the rest 1215 data points (`claim_id` from 1000 to 2214) are newly constructed.
14
  - 15.07.2024: To facilitate human evaluation, we now ask the submission files to include a `scraped_text` field, have a look in [here](https://huggingface.co/chenxwh/AVeriTeC#format-for-submission-files) for more information!
 
8
 
9
 
10
  ## NEWS:
 
11
  - 20.07.2024: The knowledge store for the test set is released [here](https://huggingface.co/chenxwh/AVeriTeC/tree/main/data_store/knowledge_store/test)!
12
  - 18.07.2024: The test data is released [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data/test.json)! Note that the first 1000 data points are from the same source of Averitec (`claim_id` from 0 to 999) and the rest 1215 data points (`claim_id` from 1000 to 2214) are newly constructed.
13
  - 15.07.2024: To facilitate human evaluation, we now ask the submission files to include a `scraped_text` field, have a look in [here](https://huggingface.co/chenxwh/AVeriTeC#format-for-submission-files) for more information!
data_store/knowledge_store/test_updated/output_test_0_499.zip DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:db5481aaec3307e483f5e15c3ad604a77089f2fe64c595b077159b69561b6842
3
- size 2740454196
 
 
 
 
data_store/knowledge_store/test_updated/output_test_1000_1499.zip DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e5427ad6bb6490f7889084ab8e4f7a54f436c76074b8971c6d405da444c804bb
3
- size 2762519869
 
 
 
 
data_store/knowledge_store/test_updated/output_test_1500_1999.zip DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1b47e9f42cfcc0363bb558bbc9801f7345c985952aa4efee592196c01cbe3fd2
3
- size 2819863865
 
 
 
 
data_store/knowledge_store/test_updated/output_test_2000_2214.zip DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:75033f2cf745ff1e21e955772bc400bc1eff60d19fa0f4f7f48f50e95d45962f
3
- size 1481402640
 
 
 
 
data_store/knowledge_store/test_updated/output_test_500_999.zip DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb866db83e5b3139e2bfb30ee33614976ddd0e35902a443ea6950d3e0fb87a7b
3
- size 2756798387
 
 
 
 
data_store/knowledge_store/test_updated/readme.md DELETED
@@ -1 +0,0 @@
1
- We have updated the evidence for each claim. In particular, in the previous version, there could be the fact-checking article for each claim. Thus, we remove such evidences, re-scrape the text from URLs, and construct this new KS.
 
 
src/reranking/bm25_sentences.py CHANGED
@@ -115,4 +115,4 @@ if __name__ == "__main__":
115
  }
116
  output_json.write(json.dumps(json_data, ensure_ascii=False) + "\n")
117
  done += 1
118
- output_json.flush()
 
115
  }
116
  output_json.write(json.dumps(json_data, ensure_ascii=False) + "\n")
117
  done += 1
118
+ output_file.flush()