jerome-white commited on
Commit
99722b5
1 Parent(s): 7b5deb7

Intro clarification

Browse files
Files changed (1) hide show
  1. _README.md +4 -4
_README.md CHANGED
@@ -5,7 +5,7 @@ pairs of responses to a judge who determines which response better
5
  addresses the prompt's intention. Rather than compare all response
6
  pairs, the framework sets one model as a baseline, then individually
7
  compares all responses to that. Its primary method of ranking models
8
- via win percentages over the baseline.
9
 
10
  This Space presents an alternative method of ranking based on the
11
  [Bradley–Terry
@@ -20,12 +20,12 @@ then be used to predict outcomes between teams that have yet to play.
20
 
21
  The Alpaca project presents a good opportunity to apply BT in
22
  practice; especially since BT fits nicely into a Bayesian analysis
23
- framework. As LLMs become more pervasive, quantifying uncertainty in
24
- their evaluation is increasingly important; something that Bayesian
25
  frameworks do well.
26
 
27
  This Space is divided into two primary sections: the first presents a
28
  ranking of models based on estimated ability. The figure on the right
29
  visualizes this ranking for the top 10 models, while the table below
30
- it presents the full set. The second section estimates the probability
31
  that one model will be preferred to another.
 
5
  addresses the prompt's intention. Rather than compare all response
6
  pairs, the framework sets one model as a baseline, then individually
7
  compares all responses to that. Its primary method of ranking models
8
+ is with win percentages over the baseline.
9
 
10
  This Space presents an alternative method of ranking based on the
11
  [Bradley–Terry
 
20
 
21
  The Alpaca project presents a good opportunity to apply BT in
22
  practice; especially since BT fits nicely into a Bayesian analysis
23
+ framework. As LLMs become more pervasive, so to is considering
24
+ evaluation uncertainty when comparing them; something that Bayesian
25
  frameworks do well.
26
 
27
  This Space is divided into two primary sections: the first presents a
28
  ranking of models based on estimated ability. The figure on the right
29
  visualizes this ranking for the top 10 models, while the table below
30
+ presents the full set. The second section estimates the probability
31
  that one model will be preferred to another.