GPT-3.5Turbo HumanEval Contamination based on "Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models"

#16

What are you reporting:

  • Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
  • Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

Evaluation dataset(s): Name(s) of the evaluation dataset(s). If available in the HuggingFace Hub please write the path (e.g. uonlp/CulturaX), otherwise provide a link to a paper, GitHub or dataset-card.
openai_humaneval

Contaminated model(s): Name of the model(s) (if any) that have been contaminated with the evaluation dataset. If available in the HuggingFace Hub please list the corresponding paths (e.g. allenai/OLMo-7B).
GPT-3.5Turbo0613, GPT-3.5Turbo1106

Briefly describe your method to detect data contamination

  • Data-based approach
  • Model-based approach

Paper introduces Contamination Detection via output Distribution approach for LLM contamination detection by identifying the peakedness of LLM’s output distribution, under the assumption that exposure to the data during the training would alter the shape of the model's output distribution. Paper presents an example of data contamination and non-contamination detection for HumanEval and custom subset of CodeForces dataset.

Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination?

URL: https://arxiv.org/pdf/2402.15938
Citation:
@article{dong2024generalization, title={Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models}, author={Dong, Yihong and Jiang, Xue and Liu, Huanyu and Jin, Zhi and Li, Ge}, journal={arXiv preprint arXiv:2402.15938}, year={2024} }

Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.

Workshop on Data Contamination org

Hi @jupyter31 !

Thanks for your contribution! I made some small fixes (changing arxiv link from pdf to abs or adding the PR number).

We are merging to main.

Best,
Oscar

OSainz changed pull request status to merged

Sign up or log in to comment