Inclusion of non open LLMs for straightforward comparison

#7
by Supreeth - opened

Although it is orthogonal to the objective of this space, including results of closed models from Google, OpenAI, Anthropic, Cohere etc on the same benchmarks would help users find open source LLM that are close enough to the closed LLM's for their particular use case. It would greatly reduce the time spent on experimentation

Thank You!

Second this, cause it's easier to distinguish something when I already have a reference for it. chatgpt 3.5 and chatgpt 4 are anchors that alot of people are likely to know.

Hugging Face H4 org

Hi! We won't do this, as this is a leaderboard for Open models, both for philosophical reasons (openness is cool) and for practical reasons: we want to ensure that the results we display are accurate and reproducible, but 1) commercial closed models can change their API thus rendering any scoring at a given time incorrect 2) we re-run everything on our cluster to ensure all models are run on the same setup and you can't do that for these models

clefourrier changed discussion status to closed

Sign up or log in to comment