Running MMLU-Pro with Eleuther LM-Eval

#814
by sam-paech - opened

I was wanting to take a look at the MMLU-Pro implementation in LM-Eval and run some benchmarks to reproduce the leaderboard results. But it seems the current version doesn't have MMLU-Pro implemented as a task. There's a branch that looks like it's in development but it only contains a readme.

Is there a fork of lm-eval somewhere about that you use for the leaderboard? Ideally I'd like to get the exact code, dataset, fewshot exemplars etc to make a 1:1 comparison to the leaderboard results.

Thanks!

Open LLM Leaderboard org

Hi!
Yes, there is a fork - @SaylorTwift has been dividing our code into lots of individual PRs to submit them to the harness, and we'll merge them all on our fork in the meantime. You'll find the first PR in the harness here, and we'll update the link to our fork once everything is merged in it.

Ah, perfect thank you!

sam-paech changed discussion status to closed

Sign up or log in to comment