What if a model trained on one of the evaluation datasets?

#22
by nbroad HF staff - opened

Would you be able to tell?

Open LLM Leaderboard org

I guess it might perform abnormally high on a benchmark. But long term the community would realize that the model is not as good as it seems.

clefourrier changed discussion status to closed

Sign up or log in to comment