Spaces:

jerome-white
/

llm-bradley-terry

Sleeping

App Files Files Community

llm-bradley-terry / docs /arena /readme.md

jerome-white's picture

Allow Alpaca and Arena results to be presented in the same space

d4dddf1 4 months ago

|

history blame contribute delete

No virus

694 Bytes

	[LMSYS Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) is an
	LLM evaluation platform. This Space presents an alternative method of
	ranking based on the [Bradley–Terry
	model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model)
	(BT). This Space takes a Bayesian approach to BT parameter estimation,
	unlike the MLE approach used by the LMSYS organization.

	This Space is divided into two primary sections: the first presents a
	ranking of models based on estimated ability. The figure on the right
	visualizes this ranking for the top 10 models, while the table below
	presents the full set. The second section estimates the probability
	that one model will be preferred to another.