Spaces:

allenai
/

reward-bench

Running

natolambert commited on Mar 26, 2024

Commit

87b1f9b

•

1 Parent(s): 874c0c9

update about

Files changed (1) hide show

src/md.py CHANGED Viewed

@@ -12,6 +12,7 @@ We average over 4 core sections (per prompt weighting):
 For Reasoning, we increase the weight of the PRM-Math subset so code and math abilities are weighed equally in the final number, rather than increasing the relevance of code.
 We add a final column, **Prior Sets** -- includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
 Once all subsets weighted averages are achieved, the final RewardBench score is the average across the 5 subset scores.

 For Reasoning, we increase the weight of the PRM-Math subset so code and math abilities are weighed equally in the final number, rather than increasing the relevance of code.
 We add a final column, **Prior Sets** -- includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
+Prior sets is weighted 0.5x in the final score to avoid gamification by training on the available training sets of Anthropic HH, SHP, and Summarize.
 Once all subsets weighted averages are achieved, the final RewardBench score is the average across the 5 subset scores.