Noob question: Training with training data portion of datasets used in benchmarking?

#564
by HankN - opened

Sorry for this noob question: Is it OK to use training data portion of datasets used in benchmarking? E.g GSM8K: training split of 7.47k out of 8.79k in total https://huggingface.co/datasets/gsm8k ? Is this counted as contamination? Thanks a lot.

Open LLM Leaderboard org

Hi!
Using the training data of our evaluation sets is not contamination, as long as the training and testing sets are not contaminated between one another (= questions almost identical between one and the other, you can see examples of that on the LMSYS article, for the MATH dataset for example ).

clefourrier changed discussion status to closed

Sign up or log in to comment