Where to find the mentioned benchmark datasets?
Like I said in the screenshot, I don't think it's a good idea to release benchmark datasets. I want to have confidence that if a model scores well, that it's because it is a good model, and not because the model was trained to do well on the test questions.
Like I said in the screenshot, I don't think it's a good idea to release benchmark datasets. I want to have confidence that if a model scores well, that it's because it is a good model, and not because the model was trained to do well on the test questions.
Without access to the eval set, it becomes challenging to assess the quality and reliability of the benchmark, as there is no clear indication of the data it is based on or the metrics it employs. This lack of transparency might hinder the trustworthiness and utility of the benchmark for model performance assessment. Why not release the eval set but keep the test set confidential?