File size: 694 Bytes
d4dddf1
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
[LMSYS Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) is an
LLM evaluation platform. This Space presents an alternative method of
ranking based on the [Bradley–Terry
model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model)
(BT). This Space takes a Bayesian approach to BT parameter estimation,
unlike the MLE approach used by the LMSYS organization.

This Space is divided into two primary sections: the first presents a
ranking of models based on estimated ability. The figure on the right
visualizes this ranking for the top 10 models, while the table below
presents the full set. The second section estimates the probability
that one model will be preferred to another.