LLM Evaluation not moving forward

#298
by SwatCat - opened

Hi,

Leaderboard evaluation has been stuck for a while. Is there any update on when it might be resumed.

Maybe there is just no pending models? There are some models released recently that aren't llama at its base. Like Baichuan series, InternLM 20B, and Qwen14B

Since Tech companies and research facilities focuses more on multi-modality recently, this text-only benchmark would not be as comprehensive evaluation as before.

But still, it's a good playground for advancing text dataset and fine-tuning technique even further.

Open LLM Leaderboard org

Hi, we still have many models being submitted by the community, however, we are preparing an update in the evaluations used by the leaderboard, we are therefore doing a small pause to catch up with all the models we already evaluated and provide the leaderboard with even more precise benchmarks :)

@SaylorTwift What kind of upgrade? The benchmarks dataset will be changed? Or their weighting contributing to the overall score?

Open LLM Leaderboard org

The current benchmarks will be left untouched, but we are experimenting with adding other benchmark to give a better view of model's performance

SaylorTwift changed discussion status to closed

Sign up or log in to comment