natolambert commited on
Commit
b988a04
1 Parent(s): f4dca79

Update src/md.py

Browse files
Files changed (1) hide show
  1. src/md.py +1 -10
src/md.py CHANGED
@@ -20,22 +20,13 @@ Once all subsets weighted averages are achieved, the final RewardBench score is
20
  We include multiple types of reward models in this evaluation:
21
  1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
22
  2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
23
- 3. **DPO**: Models trained with Direct Preference Optimization (DPO), with modifiers such as `-ref-free` or `-norm` changing how scores are computed.
24
  4. **Random**: Random choice baseline.
25
  4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
26
 
27
  All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
28
  Others, such as **Generative Judge** are coming soon.
29
 
30
- ### Model Types
31
-
32
- Currently, we evaluate the following model types:
33
- 1. **Sequence Classifiers**: A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
34
- 2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
35
- 3. **DPO**: Models trained with Direct Preference Optimization (DPO) with a reference model being either the base or supervised fine-tuning checkpoint.
36
-
37
- Support of DPO models without a reference model is coming soon.
38
-
39
  ### Subset Details
40
 
41
  Total number of the prompts is: 2985, filtered from 5123.
 
20
  We include multiple types of reward models in this evaluation:
21
  1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
22
  2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
23
+ 3. **DPO**: Models trained with Direct Preference Optimization (DPO), with modifiers such as `-ref-free` or `-norm` changing how scores are computed. *Note*: This also includes other models trained with implicit rewards, such as those trained with [KTO](https://arxiv.org/abs/2402.01306).
24
  4. **Random**: Random choice baseline.
25
  4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
26
 
27
  All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
28
  Others, such as **Generative Judge** are coming soon.
29
 
 
 
 
 
 
 
 
 
 
30
  ### Subset Details
31
 
32
  Total number of the prompts is: 2985, filtered from 5123.