Feature suggestion: average of selected (rather than all) columns

by Minus0 - opened

I'd suggest providing an option for a column which is averaged over the benchmarks the user selects rather than every single metric in the table. Right now the average seems very sensitive to differences in individual metrics (namely DROP, where the wide variance in value can affect overall model standings pretty drastically) and having the ability to include and exclude these metrics would help tailor it more toward user preference. Ideally, this could be an optional toggle or a separate column altogether.

This comment has been hidden

Agreed and I hope they fix whatever is going wrong, though in addition to that, I have certain metrics I tend to put more weight into than others (I enjoy using LLMs for creative writing, and Hellaswag and Winogrande are typically my two gotos since they're more focused on logical comprehension specifically, though different people may value different benchmarks more strongly, including the more knowledge-based ones like, e.g., MMLU, potentially more technical ones like GSM8K). Having the ability to sort according to user preference would be incredibly helpful, especially if more benchmarks get added in the future.

This comment has been hidden
Open LLM Leaderboard org


We removed DROP, and explained why here! Thank you all for raising interesting points in the discussions, it allowed to start our investigation :)

Sign up or log in to comment