Spaces:
Running
Running
natolambert
commited on
Commit
•
d1c65aa
1
Parent(s):
9328c72
Update src/md.py
Browse files
src/md.py
CHANGED
@@ -25,7 +25,10 @@ We include multiple types of reward models in this evaluation:
|
|
25 |
4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
|
26 |
|
27 |
All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
|
28 |
-
|
|
|
|
|
|
|
29 |
|
30 |
### Subset Details
|
31 |
|
|
|
25 |
4. **Generative**: Prompting fine-tuned models to choose between two answers, similar to MT Bench and AlpacaEval.
|
26 |
|
27 |
All models are evaluated in fp16 expect for Starling-7B, which is evaluated in fp32.
|
28 |
+
*Note*: The reference models for DPO models (and other implicit rewards) can be found in two ways.
|
29 |
+
* Click on a specific model in results and you'll see a key `ref_model`, e.g. [Qwen](https://huggingface.co/datasets/allenai/reward-bench-results/blob/main/eval-set/Qwen/Qwen1.5-72B-Chat.json).
|
30 |
+
* All the reference models are listed in the [evaluation configs](https://github.com/allenai/reward-bench/blob/main/scripts/configs/eval_configs.yaml).
|
31 |
+
|
32 |
|
33 |
### Subset Details
|
34 |
|