natolambert commited on
Commit
18596de
β€’
2 Parent(s): 7eaa6d2 8c39b2a

Merge branch 'main' of https://huggingface.co/spaces/allenai/reward-bench

Browse files
Files changed (4) hide show
  1. README.md +3 -2
  2. app.py +1 -1
  3. src/logo.png +0 -0
  4. src/md.py +7 -2
README.md CHANGED
@@ -6,7 +6,6 @@ colorTo: blue
6
  sdk: gradio
7
  sdk_version: 4.12.0
8
  app_file: app.py
9
- header: mini
10
  pinned: false
11
  license: apache-2.0
12
  ---
@@ -16,4 +15,6 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
16
  To develop this app, it can be run with:
17
  ```
18
  gradio app.py
19
- ```
 
 
 
6
  sdk: gradio
7
  sdk_version: 4.12.0
8
  app_file: app.py
 
9
  pinned: false
10
  license: apache-2.0
11
  ---
 
15
  To develop this app, it can be run with:
16
  ```
17
  gradio app.py
18
+ ```
19
+
20
+ Paper: https://arxiv.org/abs/2403.13787
app.py CHANGED
@@ -367,7 +367,7 @@ Warning, refusals, XSTest, and donotanswer datasets have sensitive content.""")
367
  with gr.Accordion("πŸ“š Citation", open=False):
368
  citation_button = gr.Textbox(
369
  value=r"""@misc{RewardBench,
370
- title={RewardBench: Evaluating Reward Models},
371
  author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
372
  year={2024},
373
  howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}
 
367
  with gr.Accordion("πŸ“š Citation", open=False):
368
  citation_button = gr.Textbox(
369
  value=r"""@misc{RewardBench,
370
+ title={RewardBench: Evaluating Reward Models for Language Modeling},
371
  author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
372
  year={2024},
373
  howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}
src/logo.png CHANGED
src/md.py CHANGED
@@ -9,7 +9,12 @@ We average over 4 core sections (per prompt weighting):
9
  2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
10
  3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
11
  4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
12
- 5. **Prior Sets**: Includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
 
 
 
 
 
13
 
14
  We include multiple types of reward models in this evaluation:
15
  1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
@@ -92,5 +97,5 @@ For more details, see the [dataset](https://huggingface.co/datasets/allenai/rewa
92
  TOP_TEXT = """
93
  # RewardBench: Evaluating Reward Models
94
  ### Evaluating the capabilities, safety, and pitfalls of reward models
95
- [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | Paper (coming soon)
96
  """
 
9
  2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
10
  3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
11
  4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
12
+
13
+ For Reasoning, we increase the weight of the PRM-Math subset so code and math abilities are weighed equally in the final number, rather than increasing the relevance of code.
14
+ We add a final column, **Prior Sets** -- includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
15
+
16
+ Once all subsets weighted averages are achieved, the final RewardBench score is the average across the 5 subset scores.
17
+
18
 
19
  We include multiple types of reward models in this evaluation:
20
  1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
 
97
  TOP_TEXT = """
98
  # RewardBench: Evaluating Reward Models
99
  ### Evaluating the capabilities, safety, and pitfalls of reward models
100
+ [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787)
101
  """