open-llm-leaderboard/open_llm_leaderboard · Running MMLU-Pro with Eleuther LM-Eval

Jul 1

I was wanting to take a look at the MMLU-Pro implementation in LM-Eval and run some benchmarks to reproduce the leaderboard results. But it seems the current version doesn't have MMLU-Pro implemented as a task. There's a branch that looks like it's in development but it only contains a readme.

Is there a fork of lm-eval somewhere about that you use for the leaderboard? Ideally I'd like to get the exact code, dataset, fewshot exemplars etc to make a 1:1 comparison to the leaderboard results.

Thanks!

clefourrier

Open LLM Leaderboard org Jul 2

Hi!
Yes, there is a fork - @SaylorTwift has been dividing our code into lots of individual PRs to submit them to the harness, and we'll merge them all on our fork in the meantime. You'll find the first PR in the harness here, and we'll update the link to our fork once everything is merged in it.

sam-paech

Jul 2

Ah, perfect thank you!

sam-paech changed discussion status to closed Jul 2