Need more model diversity

#64
by spaceman7777 - opened

So, I've been waiting for benchmarks on other models such as RWKV raven 14b, and the small collection of other high performing non-llama models.

The leaderboard is unfortunately 95% llama-based, so, in the cases that there are non-llama models to benchmark, it would be best to set the testing priority of non-llama models higher

Yeah we're targeting this for the next batch of human / gpt4 evals.
As for which models are on the first tab, it's primarily driven by what users submit.

clefourrier changed discussion status to closed

Is there an issue with running RWKV raven 14b? It seems to have been in the running state for something like three weeks now, and there still aren't any results for any rwkv variants. I'd guess that there must be some kind of configuration issue? I assume things are semi-paused though because of the blog post?

Anyway. Just wanted to ping that rwkv models are most likely stuck

(There still isn't an rwkv based model on the leaderboard)

spaceman7777 changed discussion status to open
Hugging Face H4 org

Hi @spaceman7777 ! We released a very big update of the LLM leaderboard today, and we'll focus on going through the backlog of models (some have been stuck for quite a bit)

Thank you for your patience :)

clefourrier changed discussion status to closed

Sign up or log in to comment