Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
[Feature Request] Adding an Additional Selector of Model Size and Releasing the Exact Command used for Evaluation
#62
by
guanqun-yang
- opened
As mentioned in the title, is it possible to add two additional features:
- Adding an additional selector of size so people could choose the best model based on their hardware budget.
- Now people have to go through the README.md of ElutherAI's repository. However, it is somewhat complicated and the exact task identifier and metrics you reported are not super clear (for example, what is "MMLU" and what specific metric you used for each task, is it
acc
oracc_norm
?).
If it's of any help, MMLU is hendrycksTest-{sub}
in lm-evaluation-harness where sub is a subtopic such as abstract_algebra
. It has 57 different tasks that you can see in lm-evaluation-harness/lm_eval/tasks/hendrycks_test.py
. You have to run and average acc_norm
across tasks to get the numbers reported on the Open LLM Leaderboard.
Hi!
The MMLU used is the one in the harness, and
@itanh0b
is completely correct in what they say about how to run it.
We also added a model parameter count in the view!
clefourrier
changed discussion status to
closed