Bayesian elo scores

#1
by passaglia - opened

Hi H4 team,

Thanks for the great leaderboard. I'd like to suggest using a Bayesian approach to estimate the strengths of the models rather than the Elo update formula. This is really easy, and I have a notebook here implementing it: https://github.com/yuzu-ai/japanese-llm-ranking/blob/main/jrank/bradley-terry.ipynb . It lets you get optimal estimates of the model strengths + bayesian confidence regions. I'm happy to help with implementation.

Cheers,
Sam

This comment has been hidden

Sign up or log in to comment