Removing LLM version clutter from the leaderboard ?

#20
by zarglu - opened

Top ten currently has 4 versions of GPT 4, 3 versions of Claude and 2 versions of Gemini.

When every new LLM clearly outclassed its predecessors this was no issue, but with the incremental improvements lately, this makes the leaderboard feel cluttered and less useful.

Maybe have a main leaderboard where only the top performing major version is listed ?
ie. only one entry for GPT4.x (the best one), only one entry for Claude 2.x, only one entry for Claude 1.x, etc.

every new LLM clearly outclassed its predecessors

They didn't, though. GPT-4-0314 was better than GPT-4-0613.

I found that the GPT-3.5-turbo-0301 version was mistakenly labeled as GPT-3.5-turbo-0314, which is confused with the GPT-4-0314 version.

Sign up or log in to comment