Update README.md
Browse files
README.md
CHANGED
@@ -110,5 +110,14 @@ Currently, the leaderboard is overfitted. It is inevitable because, unlike Kaggl
|
|
110 |
Even among my models, some received lower scores in internal data evaluations. mncai/agiin-13.6B-v0.1 > mncai/agiin-11.1B-v0.1 > mncai/mistral-7b-dpo-v6. However, on the leaderboard, mncai/mistral-7b-dpo-v6 has the highest score.
|
111 |
When choosing a model to use on the open LLM leaderboard, it would be best to evaluate with your own private dataset that is not publicly available.
|
112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
### Contact
|
114 |
If you have any questions, please raise an issue or contact us at dwmyoung@mnc.ai
|
|
|
110 |
Even among my models, some received lower scores in internal data evaluations. mncai/agiin-13.6B-v0.1 > mncai/agiin-11.1B-v0.1 > mncai/mistral-7b-dpo-v6. However, on the leaderboard, mncai/mistral-7b-dpo-v6 has the highest score.
|
111 |
When choosing a model to use on the open LLM leaderboard, it would be best to evaluate with your own private dataset that is not publicly available.
|
112 |
|
113 |
+
### Detect-Pretrain-Code-Contamination Result Share
|
114 |
+
|
115 |
+
use https://github.com/Mihaiii/detect-pretrain-code-contamination
|
116 |
+
|
117 |
+
DATASET=truthful_qa
|
118 |
+
python src/run.py --target_model mncai/mistral-7b-dpo-v6 --data $DATASET --output_dir out/$DATASET --ratio_gen 0.4
|
119 |
+
result < 0.1, %: 0.76
|
120 |
+
|
121 |
+
|
122 |
### Contact
|
123 |
If you have any questions, please raise an issue or contact us at dwmyoung@mnc.ai
|