Spaces:
Running
Running
natolambert
commited on
Merge branch 'main' of https://huggingface.co/spaces/allenai/reward-bench
Browse files
README.md
CHANGED
@@ -6,7 +6,6 @@ colorTo: blue
|
|
6 |
sdk: gradio
|
7 |
sdk_version: 4.12.0
|
8 |
app_file: app.py
|
9 |
-
header: mini
|
10 |
pinned: false
|
11 |
license: apache-2.0
|
12 |
---
|
@@ -16,4 +15,6 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
16 |
To develop this app, it can be run with:
|
17 |
```
|
18 |
gradio app.py
|
19 |
-
```
|
|
|
|
|
|
6 |
sdk: gradio
|
7 |
sdk_version: 4.12.0
|
8 |
app_file: app.py
|
|
|
9 |
pinned: false
|
10 |
license: apache-2.0
|
11 |
---
|
|
|
15 |
To develop this app, it can be run with:
|
16 |
```
|
17 |
gradio app.py
|
18 |
+
```
|
19 |
+
|
20 |
+
Paper: https://arxiv.org/abs/2403.13787
|
app.py
CHANGED
@@ -367,7 +367,7 @@ Warning, refusals, XSTest, and donotanswer datasets have sensitive content.""")
|
|
367 |
with gr.Accordion("π Citation", open=False):
|
368 |
citation_button = gr.Textbox(
|
369 |
value=r"""@misc{RewardBench,
|
370 |
-
title={RewardBench: Evaluating Reward Models},
|
371 |
author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
|
372 |
year={2024},
|
373 |
howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}
|
|
|
367 |
with gr.Accordion("π Citation", open=False):
|
368 |
citation_button = gr.Textbox(
|
369 |
value=r"""@misc{RewardBench,
|
370 |
+
title={RewardBench: Evaluating Reward Models for Language Modeling},
|
371 |
author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
|
372 |
year={2024},
|
373 |
howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}
|
src/logo.png
CHANGED
src/md.py
CHANGED
@@ -9,7 +9,12 @@ We average over 4 core sections (per prompt weighting):
|
|
9 |
2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
10 |
3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
11 |
4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
We include multiple types of reward models in this evaluation:
|
15 |
1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
|
@@ -92,5 +97,5 @@ For more details, see the [dataset](https://huggingface.co/datasets/allenai/rewa
|
|
92 |
TOP_TEXT = """
|
93 |
# RewardBench: Evaluating Reward Models
|
94 |
### Evaluating the capabilities, safety, and pitfalls of reward models
|
95 |
-
[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | Paper
|
96 |
"""
|
|
|
9 |
2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
10 |
3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
11 |
4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
12 |
+
|
13 |
+
For Reasoning, we increase the weight of the PRM-Math subset so code and math abilities are weighed equally in the final number, rather than increasing the relevance of code.
|
14 |
+
We add a final column, **Prior Sets** -- includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
|
15 |
+
|
16 |
+
Once all subsets weighted averages are achieved, the final RewardBench score is the average across the 5 subset scores.
|
17 |
+
|
18 |
|
19 |
We include multiple types of reward models in this evaluation:
|
20 |
1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
|
|
|
97 |
TOP_TEXT = """
|
98 |
# RewardBench: Evaluating Reward Models
|
99 |
### Evaluating the capabilities, safety, and pitfalls of reward models
|
100 |
+
[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787)
|
101 |
"""
|