DontPlanToEnd/UGI-Leaderboard · Where to find the mentioned benchmark datasets?

Mar 16

As shown in the main page:

Owner Mar 16

Like I said in the screenshot, I don't think it's a good idea to release benchmark datasets. I want to have confidence that if a model scores well, that it's because it is a good model, and not because the model was trained to do well on the test questions.

DontPlanToEnd changed discussion status to closed Mar 16

zhiminy

Mar 16

•

edited Mar 16

Like I said in the screenshot, I don't think it's a good idea to release benchmark datasets. I want to have confidence that if a model scores well, that it's because it is a good model, and not because the model was trained to do well on the test questions.

Without access to the eval set, it becomes challenging to assess the quality and reliability of the benchmark, as there is no clear indication of the data it is based on or the metrics it employs. This lack of transparency might hinder the trustworthiness and utility of the benchmark for model performance assessment. Why not release the eval set but keep the test set confidential?