Spaces:
Running
on
CPU Upgrade
MMLU by task leaderboard
I created a leaderboard showing the accuracy score for each task in MMLU https://huggingface.co/spaces/CoreyMorris/MMLU-by-task-Leaderboard . I'll keep it updated at least until hugging face decides to create one one with the breakdown by tasks. I'm open to suggestions for improving it.
This is a great idea! (We probably won't add one here at the moment)
Overall, I would suggest:
- removing non MMLU scores
- adding some of the original MMLU groupings (humanities, social sciences, STEM, other) (you can find more info on the original repository)
- using a bigger widget for the table (it's hard to search in it) and possibly adding a search function.
I really like the plots, you could add some explanation of what you are plotting and why, it would really enrich your page.
Lastly, don't forget your own citation link! :)
Thanks for the suggestions !
- I made the table bigger and added some ways to filter(Model size, model name, and task name)
- Also added some explanation for the plotting and my own citation.
I'll probably add the original MMLU groupings as well. Not sure about removing the non MMLU scores. I want people to be able to compare those as well, but I should probably at least have some explanation and maybe have them hidden or less prominent by default.
Thanks again for the feedback !