Spaces:
Running
on
CPU Upgrade
What are "raw" metrics?
What is the difference between the "raw" scores and the scores in used in the leaderboard?
And how can you convert from the lm-eval outputs to the scores in the leaderboard? Does the lm-eval-harness output the raw score?
Hi @aginart-salesforce ,
Please, check this page in our documentation about scores normalization. If anything remains unclear, you can ping me here and I'll try to explain :)
I understand the logic of score normalization, but will this normalization be done when doing local leaderboard model evaluation? Because when I do local evaluation, I only have the original score but no normalized score. Thank you @alozowski
Hi! No, you need to compute it yourself (using the snippets in the doc) to get results when doing a local evaluation. We will soon provide scripts to reproduce the scores precisely.