natolambert commited on
Commit
d1c65aa
1 Parent(s): 9328c72

Update src/md.py

Browse files
Files changed (1) hide show
  1. src/md.py +4 -1
src/md.py CHANGED
@@ -25,7 +25,10 @@ We include multiple types of reward models in this evaluation:
25
  4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
26
 
27
  All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
28
- Others, such as **Generative Judge** are coming soon.
 
 
 
29
 
30
  ### Subset Details
31
 
 
25
  4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
26
 
27
  All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
28
+ *Note*: The reference models for DPO models (and other implicit rewards) can be found in two ways.
29
+ * Click on a specific model in results and you'll see a key `ref_model`, e.g. [Qwen](https://huggingface.co/datasets/allenai/reward-bench-results/blob/main/eval-set/Qwen/Qwen1.5-72B-Chat.json).
30
+ * All the reference models are listed in the [evaluation configs](https://github.com/allenai/reward-bench/blob/main/scripts/configs/eval_configs.yaml).
31
+
32
 
33
  ### Subset Details
34