Why is the score for RACE so low?

#18
by thangphan68 - opened

If I understand correctly, the score for this dataset is just accuracy where the model must answer multiple choice questions. In that case, 49% as the highest score is really low. Also according to the Orca2 paper, it reached ~80% accuracy on this dataset. Am I misunderstanding something?

hallucinations-leaderboard org

Hey @thangphan68 , we use the Harness implementation of RACE, which is available here: https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/race

@thangphan68 if you have suggestions on how to improve it, I can integrate your changes and do a pull request on the Harness repo!

Sign up or log in to comment