Spaces:

CONDA-Workshop
/

Data-Contamination-Database

Running

App Files Files Community

AmeyaPrabhu commited on Apr 24, 2024

Commit

383926d

verified ·

1 Parent(s): 9852685

Update contamination_report.csv

Browse files

What are you reporting:

- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

**(a) Contaminated model(s)**: ChatGPT and GPT-4

**Corresponding Contaminated corpora**:
cais/mmlu (52% and 57% respectively for ChatGPT and GPT-4)

**Approach**:

- [ ] Data-based approach
- [x] Model-based approach

Description of your method, 3-4 sentences. Evidence of data contamination (Read below):

The [paper](https://arxiv.org/abs/2311.09783) would mask incorrect choices in the MMLU test set, and the model would be able to predict the missing mask with 57% Exact Match rate.

Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination? Yes

**url**: https://arxiv.org/abs/2311.09783
```@article{deng2023investigating,
title={Investigating data contamination in modern benchmarks for large language models},
author={Deng, Chunyuan and Zhao, Yilun and Tang, Xiangru and Gerstein, Mark and Cohan, Arman},
journal={arXiv preprint arXiv:2311.09783},
year={2023}
}
```

Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.

Full name: Ameya Prabhu
Institution: Tübingen AI Center, University of Tübingen
Email: ameya@prabhu.be

Files changed (1) hide show

contamination_report.csv +4 -1

contamination_report.csv CHANGED Viewed

@@ -462,4 +462,7 @@ bigbio/mednli;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.
 RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
 RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8

 RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
 RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
+cais/mmlu;;ChatGPT;model;0.0;0.0;52.0;model-based;https://arxiv.org/abs/2311.09783;
+cais/mmlu;;GPT-4;model;0.0;0.0;57.0;model-based;https://arxiv.org/abs/2311.09783;