Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Why is the score for RACE so low?
#18
by
scinerd68
- opened
If I understand correctly, the score for this dataset is just accuracy where the model must answer multiple choice questions. In that case, 49% as the highest score is really low. Also according to the Orca2 paper, it reached ~80% accuracy on this dataset. Am I misunderstanding something?
Hey @thangphan68, we use the Harness implementation of RACE, which is available here: https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/race
@thangphan68 if you have suggestions on how to improve it, I can integrate your changes and do a pull request on the Harness repo!