Spaces:

allenai
/

reward-bench

Running

App Files Files Community

natolambert commited on Mar 26, 2024

Commit

18596de

•

2 Parent(s): 7eaa6d2 8c39b2a

Merge branch 'main' of https://huggingface.co/spaces/allenai/reward-bench

Browse files

Files changed (4) hide show

README.md +3 -2
app.py +1 -1
src/logo.png +0 -0
src/md.py +7 -2

README.md CHANGED Viewed

@@ -6,7 +6,6 @@ colorTo: blue
 sdk: gradio
 sdk_version: 4.12.0
 app_file: app.py
-header: mini
 pinned: false
 license: apache-2.0
 ---
@@ -16,4 +15,6 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 To develop this app, it can be run with:
 ```
 gradio app.py
-```

 sdk: gradio
 sdk_version: 4.12.0
 app_file: app.py
 pinned: false
 license: apache-2.0
 ---
 To develop this app, it can be run with:
 ```
 gradio app.py
+```
+Paper: https://arxiv.org/abs/2403.13787

app.py CHANGED Viewed

@@ -367,7 +367,7 @@ Warning, refusals, XSTest, and donotanswer datasets have sensitive content.""")
         with gr.Accordion("📚 Citation", open=False):
             citation_button = gr.Textbox(
                 value=r"""@misc{RewardBench,
-    title={RewardBench: Evaluating Reward Models},
     author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
     year={2024},
     howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}

         with gr.Accordion("📚 Citation", open=False):
             citation_button = gr.Textbox(
                 value=r"""@misc{RewardBench,
+    title={RewardBench: Evaluating Reward Models for Language Modeling},
     author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
     year={2024},
     howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}

src/logo.png CHANGED Viewed

src/md.py CHANGED Viewed

@@ -9,7 +9,12 @@ We average over 4 core sections (per prompt weighting):
 2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
 3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
 4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
-5. **Prior Sets**: Includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
 We include multiple types of reward models in this evaluation:
 1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
@@ -92,5 +97,5 @@ For more details, see the [dataset](https://huggingface.co/datasets/allenai/rewa
 TOP_TEXT = """
 # RewardBench: Evaluating Reward Models
 ### Evaluating the capabilities, safety, and pitfalls of reward models
-[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | Paper (coming soon)
 """

 2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
 3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
 4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
+For Reasoning, we increase the weight of the PRM-Math subset so code and math abilities are weighed equally in the final number, rather than increasing the relevance of code.
+We add a final column, **Prior Sets** -- includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
+Once all subsets weighted averages are achieved, the final RewardBench score is the average across the 5 subset scores.
 We include multiple types of reward models in this evaluation:
 1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
 TOP_TEXT = """
 # RewardBench: Evaluating Reward Models
 ### Evaluating the capabilities, safety, and pitfalls of reward models
+[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787)
 """