Spaces:
Running
Running
natolambert
commited on
Commit
•
61c1fca
1
Parent(s):
7e0e569
update to reasoning
Browse files
app.py
CHANGED
@@ -41,7 +41,7 @@ def avg_over_rewardbench(dataframe_core, dataframe_prefs):
|
|
41 |
1. Chat: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium)
|
42 |
2. Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
43 |
3. Safety: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
44 |
-
4.
|
45 |
5. Classic Sets: Includes the test sets (anthropic_helpful, mtbench_human, shp, summarize)
|
46 |
"""
|
47 |
new_df = dataframe_core.copy()
|
|
|
41 |
1. Chat: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium)
|
42 |
2. Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
43 |
3. Safety: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
44 |
+
4. Reasoning: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
45 |
5. Classic Sets: Includes the test sets (anthropic_helpful, mtbench_human, shp, summarize)
|
46 |
"""
|
47 |
new_df = dataframe_core.copy()
|