bpHigh commited on
Commit
9cf7873
β€’
1 Parent(s): 95be02e

Add Aquila model series which have gsm8k test set contamination

Browse files

## What are you reporting:
- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

**Evaluation dataset(s)**: Name(s) of the evaluation dataset(s). If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise provide a link to a paper, GitHub or dataset-card.
`gsm8k`
**Contaminated model(s)**: Name of the model(s) (if any) that have been contaminated with the evaluation dataset. If available in the HuggingFace Hub please list the corresponding paths (e.g. `allenai/OLMo-7B`).
`Aquila2-34B` , `AquilaChat2-34B`
## Briefly describe your method to detect data contamination

- [ ] Data-based approach
- [x] Model-based approach

[Official Release Readme](https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md) has information that the 34B parameter versions of Aquila2 series contain gsm8k test set data contamination in their pre-training dataset.

![Screenshot 2024-05-05 at 7.17.12β€―PM.png](https://cdn-uploads.huggingface.co/production/uploads/623f2f5828672458f74879b3/sMPW2vkgWahxpAF8i0SBL.png)


## Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination?

URL: https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md , https://huggingface.co/BAAI/AquilaChat2-34B/blob/main/README.md
Citation:


*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Bhavish Pahwa
- Institution: Microsoft Research
- Email: t-bpahwa@microsoft.com

Files changed (1) hide show
  1. contamination_report.csv +2 -0
contamination_report.csv CHANGED
@@ -150,6 +150,8 @@ gigaword;;togethercomputer/RedPajama-Data-V2;;corpus;;;2.82;data-based;https://a
150
 
151
  gsm8k;;GPT-4;;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
152
  gsm8k;;GPT-4;;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
 
 
153
 
154
  head_qa;en;EleutherAI/pile;;corpus;;;5.11;data-based;https://arxiv.org/abs/2310.20707;2
155
  head_qa;en;allenai/c4;;corpus;;;5.22;data-based;https://arxiv.org/abs/2310.20707;2
 
150
 
151
  gsm8k;;GPT-4;;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
152
  gsm8k;;GPT-4;;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
153
+ gsm8k;;BAAI/AquilaChat2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/AquilaChat2-34B/blob/main/README.md
154
+ gsm8k;;BAAI/Aquila2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md
155
 
156
  head_qa;en;EleutherAI/pile;;corpus;;;5.11;data-based;https://arxiv.org/abs/2310.20707;2
157
  head_qa;en;allenai/c4;;corpus;;;5.22;data-based;https://arxiv.org/abs/2310.20707;2