open-llm-leaderboard/open_llm_leaderboard · is everything good with the benchmark?

27 days ago

I haven't seen model processing for days, and i wonder if there is anything wrong with the benchmark?

26 days ago

Well, I've patiently been waiting for 9 (!) days for Pantheon to finally get benchmarked but I don't get the feeling this is being maintained so much anymore.

saishf

26 days ago

Unfortunately, the HF research cluster is super full at the moment, here is the small announcement about it. We will evaluate your model, but it will take more time than usual.

The HF research cluster is super full at the moment, which means that evaluations on the open llm leaderboard will slow down 😴

However, if you feel like a recently released model is super important to have there for the community, open a discussion and we'll do our best!

I guess something big is being cooked up

clefourrier

Open LLM Leaderboard org 25 days ago

Thanks @saishf for pointing out these resources! This is entirely correct, stay posted and you'll see something cool coming up in the other research teams :)

@hooking-dev thanks for your interest!
@Gryphe I'd like to point out that we evaluated about 1K models over the last month, so "not being maintained so much anymore" feel a bit unfair ^^"
Please take into account that evaluating a 70B takes 10h minimum, and that Hugging Face is GPU middle class, not GPU rich - we can't just increase the amount of GPUs on which to run evaluations suddenly. If you feel like your model deserves a manual evaluation because it's SOTA for example, please open a dedicated discussion.

Closing.

clefourrier changed discussion status to closed 25 days ago