jerome-white's picture
Allow Alpaca and Arena results to be presented in the same space
d4dddf1
|
raw
history blame contribute delete
No virus
694 Bytes
[LMSYS Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) is an
LLM evaluation platform. This Space presents an alternative method of
ranking based on the [Bradley–Terry
model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model)
(BT). This Space takes a Bayesian approach to BT parameter estimation,
unlike the MLE approach used by the LMSYS organization.
This Space is divided into two primary sections: the first presents a
ranking of models based on estimated ability. The figure on the right
visualizes this ranking for the top 10 models, while the table below
presents the full set. The second section estimates the probability
that one model will be preferred to another.