jerome-white's picture
Allow Alpaca and Arena results to be presented in the same space
d4dddf1
|
raw
history blame contribute delete
No virus
694 Bytes

A newer version of the Gradio SDK is available: 4.39.0

Upgrade

LMSYS Chatbot Arena is an LLM evaluation platform. This Space presents an alternative method of ranking based on the Bradley–Terry model (BT). This Space takes a Bayesian approach to BT parameter estimation, unlike the MLE approach used by the LMSYS organization.

This Space is divided into two primary sections: the first presents a ranking of models based on estimated ability. The figure on the right visualizes this ranking for the top 10 models, while the table below presents the full set. The second section estimates the probability that one model will be preferred to another.