suryanshs16103 commited on
Commit
502b10a
Β·
verified Β·
1 Parent(s): 9fba4d8

Update contamination_report.csv

Browse files

## What are you reporting:

- [ ] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

**Evaluation dataset(s)**: openai_humaneval

**Contaminated model(s)**: gpt-3.5-turbo-1106, gpt-3.5-turbo-0613

**Contaminated split(s)**: 41.47%, 23.79%

## Briefly describe your method to detect data contamination

- [ ] Model-based approach

#### Model-based approaches

The cited paper highlights how ChatGPT, when tested with the HumanEval dataset, shows high contamination levels. This is evident from the high Average Peak and Leak Ratios, especially compared to the clean CodeForces2305 dataset where ChatGPT's performance drops. The TED method proves effective in identifying and mitigating these contamination issues. The values can be verified from Table 5 of the cited paper.

## Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination?

URL: `https://arxiv.org/pdf/2402.15938`
Citation: `@misc{dong2024generalization,
title={Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models},
author={Yihong Dong and Xue Jiang and Huanyu Liu and Zhi Jin and Ge Li},
year={2024},
eprint={2402.15938},
archivePrefix={arXiv},
primaryClass={cs.CL}
}`


*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Suryansh Sharma
- Institution: Indian Institute of Technology Kharagpur
- Email: suryansh.s@kgpian.iitkgp.ac.in

Files changed (1) hide show
  1. contamination_report.csv +3 -0
contamination_report.csv CHANGED
@@ -707,3 +707,6 @@ zest;;EleutherAI/pile;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;
707
  zest;;allenai/c4;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
708
  zest;;oscar-corpus/OSCAR-2301;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
709
  zest;;togethercomputer/RedPajama-Data-V2;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
 
 
 
 
707
  zest;;allenai/c4;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
708
  zest;;oscar-corpus/OSCAR-2301;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
709
  zest;;togethercomputer/RedPajama-Data-V2;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
710
+
711
+ openai_humaneval;;GPT-3.5;turbo-0613;model;;;23.79;model-based;https://arxiv.org/pdf/2402.15938;
712
+ openai_humaneval;;GPT-3.5;turbo-1106;model;;;41.47;model-based;https://arxiv.org/pdf/2402.15938;