Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1020

[Feature Request] Adding an Additional Selector of Model Size and Releasing the Exact Command used for Evaluation

#62

by guanqun-yang - opened Jun 9, 2023

Discussion

guanqun-yang

Jun 9, 2023

As mentioned in the title, is it possible to add two additional features:

Adding an additional selector of size so people could choose the best model based on their hardware budget.
Now people have to go through the README.md of ElutherAI's repository. However, it is somewhat complicated and the exact task identifier and metrics you reported are not super clear (for example, what is "MMLU" and what specific metric you used for each task, is it acc or acc_norm?).

itanh0b

Jun 13, 2023

If it's of any help, MMLU is hendrycksTest-{sub} in lm-evaluation-harness where sub is a subtopic such as abstract_algebra. It has 57 different tasks that you can see in lm-evaluation-harness/lm_eval/tasks/hendrycks_test.py. You have to run and average acc_norm across tasks to get the numbers reported on the Open LLM Leaderboard.

clefourrier

Open LLM Leaderboard org Jun 16, 2023

Hi!
The MMLU used is the one in the harness, and @itanh0b is completely correct in what they say about how to run it.
We also added a model parameter count in the view!

clefourrier changed discussion status to closed Jun 16, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment