Update README.md
Browse files
README.md
CHANGED
@@ -26,8 +26,9 @@ The AquilaChat2-34B model is close to or exceeds the level of GPT3.5 in the subj
|
|
26 |
|
27 |
The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels.
|
28 |
|
|
|
29 |
<p>
|
30 |
-
|
31 |
|
32 |
Upon thorough investigation and analysis, it was found that the data leakage occurred in the mathematical dataset A (over 2 million samples), recommended by a team we have collaborated with multiple times. This dataset includes the untreated GSM8K test set (1319 samples). The team only performed routine de-duplication and quality checks but did not conduct an extra filtering check for the presence of the GSM8K test data, resulting in this oversight.
|
33 |
|
|
|
26 |
|
27 |
The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels.
|
28 |
|
29 |
+
### Note
|
30 |
<p>
|
31 |
+
We have discovered a data leakage problem with the GSM8K test data in the pre-training task dataset. Therefore, the evaluation results of GSM8K have been removed from the evaluation results.
|
32 |
|
33 |
Upon thorough investigation and analysis, it was found that the data leakage occurred in the mathematical dataset A (over 2 million samples), recommended by a team we have collaborated with multiple times. This dataset includes the untreated GSM8K test set (1319 samples). The team only performed routine de-duplication and quality checks but did not conduct an extra filtering check for the presence of the GSM8K test data, resulting in this oversight.
|
34 |
|