Add Aquila model series which have gsm8k test set contamination
Browse files## What are you reporting:
- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
**Evaluation dataset(s)**: Name(s) of the evaluation dataset(s). If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise provide a link to a paper, GitHub or dataset-card.
`gsm8k`
**Contaminated model(s)**: Name of the model(s) (if any) that have been contaminated with the evaluation dataset. If available in the HuggingFace Hub please list the corresponding paths (e.g. `allenai/OLMo-7B`).
`Aquila2-34B` , `AquilaChat2-34B`
## Briefly describe your method to detect data contamination
- [ ] Data-based approach
- [x] Model-based approach
[Official Release Readme](https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md) has information that the 34B parameter versions of Aquila2 series contain gsm8k test set data contamination in their pre-training dataset.
![Screenshot 2024-05-05 at 7.17.12β―PM.png](https://cdn-uploads.huggingface.co/production/uploads/623f2f5828672458f74879b3/sMPW2vkgWahxpAF8i0SBL.png)
## Citation
Is there a paper that reports the data contamination or describes the method used to detect data contamination?
URL: https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md , https://huggingface.co/BAAI/AquilaChat2-34B/blob/main/README.md
Citation:
*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Bhavish Pahwa
- Institution: Microsoft Research
- Email: t-bpahwa@microsoft.com
- contamination_report.csv +2 -0
@@ -150,6 +150,8 @@ gigaword;;togethercomputer/RedPajama-Data-V2;;corpus;;;2.82;data-based;https://a
|
|
150 |
|
151 |
gsm8k;;GPT-4;;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
|
152 |
gsm8k;;GPT-4;;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
|
|
|
|
|
153 |
|
154 |
head_qa;en;EleutherAI/pile;;corpus;;;5.11;data-based;https://arxiv.org/abs/2310.20707;2
|
155 |
head_qa;en;allenai/c4;;corpus;;;5.22;data-based;https://arxiv.org/abs/2310.20707;2
|
|
|
150 |
|
151 |
gsm8k;;GPT-4;;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
|
152 |
gsm8k;;GPT-4;;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
|
153 |
+
gsm8k;;BAAI/AquilaChat2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/AquilaChat2-34B/blob/main/README.md
|
154 |
+
gsm8k;;BAAI/Aquila2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md
|
155 |
|
156 |
head_qa;en;EleutherAI/pile;;corpus;;;5.11;data-based;https://arxiv.org/abs/2310.20707;2
|
157 |
head_qa;en;allenai/c4;;corpus;;;5.22;data-based;https://arxiv.org/abs/2310.20707;2
|