jerome-white commited on
Commit
b01565f
1 Parent(s): 6ee7b0c

Work around Hugging Face managed writes

Browse files
README.md DELETED
@@ -1,38 +0,0 @@
1
- ---
2
- title: alpaca-bt-eval
3
- app_file: app.py
4
- sdk: gradio
5
- sdk_version: 4.19.1
6
- ---
7
- [Alpaca](https://github.com/tatsu-lab/alpaca_eval) is an LLM
8
- evaluation framework. It maintains a set of prompts, along with
9
- responses to those prompts from a collection of LLMs. It then presents
10
- pairs of responses to a judge that determines which response better
11
- addresses the prompt. Rather than compare all response pairs, the
12
- framework identifies a baseline model and compares all models to
13
- that. The standard method of ranking models is to sort by baseline
14
- model win percentage.
15
-
16
- This Space presents an alternative method of ranking based on the
17
- [Bradley–Terry
18
- model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model)
19
- (BT). Given a collection of items, Bradley–Terry estimates the
20
- _ability_ of each item based on pairwise comparisons between them. In
21
- sports, for example, that might be the ability of a given team based
22
- on games that team has played within a league. Once calculated,
23
- ability can be used to estimate the probability that one item will be
24
- better-than another, even if those items have yet to be formally
25
- compared.
26
-
27
- The Alpaca project presents a good opportunity to apply BT in
28
- practice; especially since BT fits nicely into a Bayesian analysis
29
- framework. As LLMs become more pervasive, quantifying the uncertainty
30
- in their evaluation is increasingly important. Bayesian frameworks are
31
- good at that.
32
-
33
- This Space is divided into two primary sections: the first presents a
34
- ranking of models based on estimated ability. The figure on the right
35
- presents this ranking for the top 10 models, while the table below
36
- presents the full set. The second section estimates the probability
37
- that one model will be preferred to another. A final section at the
38
- bottom is a disclaimer that presents details about the workflow.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DISCLAIMER.md → _DISCLAIMER.md RENAMED
File without changes
OVERVIEW.md → _README.md RENAMED
@@ -1,9 +1,3 @@
1
- ---
2
- title: alpaca-bt-eval
3
- app_file: app.py
4
- sdk: gradio
5
- sdk_version: 4.19.1
6
- ---
7
  [Alpaca](https://github.com/tatsu-lab/alpaca_eval) is an LLM
8
  evaluation framework. It maintains a set of prompts, along with
9
  responses to those prompts from a collection of LLMs. It then presents
 
 
 
 
 
 
 
1
  [Alpaca](https://github.com/tatsu-lab/alpaca_eval) is an LLM
2
  evaluation framework. It maintains a set of prompts, along with
3
  responses to those prompts from a collection of LLMs. It then presents