Availability to evaluate LLMs like in the HF blog post

#24
by sjrhuschlee - opened

Hello, I just saw thi blog post https://huggingface.co/blog/zero-shot-eval-on-the-hub which I am really excited about! I wanted to ask when the example in the blog post will be available to access in the HF leader boards?

Evaluation on the Hub org

Hi @sjrlee ! Great feature request idea - gently pinging @Tristan to add this task to the leaderboards so people can view the scores from the LLM evaluations on https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=-any-

Evaluation on the Hub org

Hi @sjrlee you can find the evaluations coming from the zeroshot pipeline under the text-generation task on the leaderboards, e.g. https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=mathemakitten%2Fwinobias_antistereotype_test

Feel free to close this issue if that addresses your query

Thanks, @lewtun ! That is great to see. Before I close this I wanted to ask if there are plans to add the results of all the model sizes shown in the blog post?

Sign up or log in to comment