Spaces:

jerome-white
/

llm-bradley-terry

Sleeping

jerome-white commited on Feb 22

Commit

99722b5

•

1 Parent(s): 7b5deb7

Intro clarification

Files changed (1) hide show

_README.md CHANGED Viewed

@@ -5,7 +5,7 @@ pairs of responses to a judge who determines which response better
 addresses the prompt's intention. Rather than compare all response
 pairs, the framework sets one model as a baseline, then individually
 compares all responses to that. Its primary method of ranking models
-via win percentages over the baseline.
 This Space presents an alternative method of ranking based on the
 [Bradley–Terry
@@ -20,12 +20,12 @@ then be used to predict outcomes between teams that have yet to play.
 The Alpaca project presents a good opportunity to apply BT in
 practice; especially since BT fits nicely into a Bayesian analysis
-framework. As LLMs become more pervasive, quantifying uncertainty in
-their evaluation is increasingly important; something that Bayesian
 frameworks do well.
 This Space is divided into two primary sections: the first presents a
 ranking of models based on estimated ability. The figure on the right
 visualizes this ranking for the top 10 models, while the table below
-it presents the full set. The second section estimates the probability
 that one model will be preferred to another.

 addresses the prompt's intention. Rather than compare all response
 pairs, the framework sets one model as a baseline, then individually
 compares all responses to that. Its primary method of ranking models
+is with win percentages over the baseline.
 This Space presents an alternative method of ranking based on the
 [Bradley–Terry
 The Alpaca project presents a good opportunity to apply BT in
 practice; especially since BT fits nicely into a Bayesian analysis
+framework. As LLMs become more pervasive, so to is considering
+evaluation uncertainty when comparing them; something that Bayesian
 frameworks do well.
 This Space is divided into two primary sections: the first presents a
 ranking of models based on estimated ability. The figure on the right
 visualizes this ranking for the top 10 models, while the table below
+presents the full set. The second section estimates the probability
 that one model will be preferred to another.