Where to find the mentioned benchmark datasets?

#1
by zhiminy - opened

As shown in the main page:
image.png

Like I said in the screenshot, I don't think it's a good idea to release benchmark datasets. I want to have confidence that if a model scores well, that it's because it is a good model, and not because the model was trained to do well on the test questions.

DontPlanToEnd changed discussion status to closed

Like I said in the screenshot, I don't think it's a good idea to release benchmark datasets. I want to have confidence that if a model scores well, that it's because it is a good model, and not because the model was trained to do well on the test questions.

Without access to the eval set, it becomes challenging to assess the quality and reliability of the benchmark, as there is no clear indication of the data it is based on or the metrics it employs. This lack of transparency might hinder the trustworthiness and utility of the benchmark for model performance assessment. Why not release the eval set but keep the test set confidential?

Sign up or log in to comment