MMLU Average Score

#100
by paopao0226 - opened

Thanks for your detailed information on the "About" board. And I am also confused that there is only one score of MMLU on the leaderboard, but MMLU's calculation needs 57 tasks. So how to mix these tasks' score to single one? Just add them and divide 57(the number of tasks)? Or any trick of calculation? Thanks.

Open LLM Leaderboard org

If you run the harness as mentioned, it will provide an average score at the end :)

@clefourrier But when I run the harness as mentioned, It just has the results of subtasks.
python main.py --model=hf-causal-experimental --model_args="pretrained=<model_path>,use_accelerate=True" --num_fewshot=5 --device=cuda --task=hendrycksTest-* --batch_size=4 --output_path=<output_path>
here is the instrument that I run :(

Open LLM Leaderboard org

Don't you have an "all" value at the end of the table displayed or in the files saved?

Open LLM Leaderboard org

Ha, my bad, sorry, it's an internal thing we added for logging!
We just do an average :)

@clefourrier Okkkk, thanks! Hoping you have a good time!

paopao0226 changed discussion status to closed

Sign up or log in to comment