Xiaowen-dg commited on
Commit
8d0d206
1 Parent(s): 1fd9004

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3252 -0
README.md CHANGED
@@ -2097,6 +2097,3258 @@ model-index:
2097
  Vulnerability Tsx async abort: Not affected
2098
 
2099
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2100
  Versions of relevant libraries:
2101
 
2102
  [pip3] numpy==1.24.1
 
2097
  Vulnerability Tsx async abort: Not affected
2098
 
2099
 
2100
+ Versions of relevant libraries:
2101
+
2102
+ [pip3] numpy==1.24.1
2103
+
2104
+ [pip3] torch==2.1.2
2105
+
2106
+ [pip3] torchaudio==2.0.2+cu118
2107
+
2108
+ [pip3] torchvision==0.15.2+cu118
2109
+
2110
+ [pip3] triton==2.1.0
2111
+
2112
+ [conda] Could not collect'
2113
+ transformers_version: 4.42.4
2114
+ - task:
2115
+ type: mmlu
2116
+ dataset:
2117
+ name: mmlu
2118
+ type: public-dataset
2119
+ metrics:
2120
+ - type: acc
2121
+ value: '0.595'
2122
+ args:
2123
+ results:
2124
+ mmlu:
2125
+ acc,none: 0.5817547357926222
2126
+ acc_stderr,none: 0.0039373066351597085
2127
+ alias: mmlu
2128
+ mmlu_humanities:
2129
+ alias: ' - humanities'
2130
+ acc,none: 0.5247608926673751
2131
+ acc_stderr,none: 0.006839745323517898
2132
+ mmlu_formal_logic:
2133
+ alias: ' - formal_logic'
2134
+ acc,none: 0.35714285714285715
2135
+ acc_stderr,none: 0.042857142857142816
2136
+ mmlu_high_school_european_history:
2137
+ alias: ' - high_school_european_history'
2138
+ acc,none: 0.696969696969697
2139
+ acc_stderr,none: 0.035886248000917075
2140
+ mmlu_high_school_us_history:
2141
+ alias: ' - high_school_us_history'
2142
+ acc,none: 0.7745098039215687
2143
+ acc_stderr,none: 0.02933116229425172
2144
+ mmlu_high_school_world_history:
2145
+ alias: ' - high_school_world_history'
2146
+ acc,none: 0.7974683544303798
2147
+ acc_stderr,none: 0.026160568246601453
2148
+ mmlu_international_law:
2149
+ alias: ' - international_law'
2150
+ acc,none: 0.7107438016528925
2151
+ acc_stderr,none: 0.041391127276354626
2152
+ mmlu_jurisprudence:
2153
+ alias: ' - jurisprudence'
2154
+ acc,none: 0.7037037037037037
2155
+ acc_stderr,none: 0.04414343666854932
2156
+ mmlu_logical_fallacies:
2157
+ alias: ' - logical_fallacies'
2158
+ acc,none: 0.7055214723926381
2159
+ acc_stderr,none: 0.03581165790474082
2160
+ mmlu_moral_disputes:
2161
+ alias: ' - moral_disputes'
2162
+ acc,none: 0.615606936416185
2163
+ acc_stderr,none: 0.026189666966272028
2164
+ mmlu_moral_scenarios:
2165
+ alias: ' - moral_scenarios'
2166
+ acc,none: 0.2837988826815642
2167
+ acc_stderr,none: 0.01507835897075178
2168
+ mmlu_philosophy:
2169
+ alias: ' - philosophy'
2170
+ acc,none: 0.6591639871382636
2171
+ acc_stderr,none: 0.02692084126077615
2172
+ mmlu_prehistory:
2173
+ alias: ' - prehistory'
2174
+ acc,none: 0.6666666666666666
2175
+ acc_stderr,none: 0.026229649178821163
2176
+ mmlu_professional_law:
2177
+ alias: ' - professional_law'
2178
+ acc,none: 0.4348109517601043
2179
+ acc_stderr,none: 0.012661233805616292
2180
+ mmlu_world_religions:
2181
+ alias: ' - world_religions'
2182
+ acc,none: 0.7602339181286549
2183
+ acc_stderr,none: 0.03274485211946956
2184
+ mmlu_other:
2185
+ alias: ' - other'
2186
+ acc,none: 0.6678467975539105
2187
+ acc_stderr,none: 0.008199669520892388
2188
+ mmlu_business_ethics:
2189
+ alias: ' - business_ethics'
2190
+ acc,none: 0.6
2191
+ acc_stderr,none: 0.049236596391733084
2192
+ mmlu_clinical_knowledge:
2193
+ alias: ' - clinical_knowledge'
2194
+ acc,none: 0.6943396226415094
2195
+ acc_stderr,none: 0.028353298073322663
2196
+ mmlu_college_medicine:
2197
+ alias: ' - college_medicine'
2198
+ acc,none: 0.5780346820809249
2199
+ acc_stderr,none: 0.03765746693865151
2200
+ mmlu_global_facts:
2201
+ alias: ' - global_facts'
2202
+ acc,none: 0.41
2203
+ acc_stderr,none: 0.04943110704237102
2204
+ mmlu_human_aging:
2205
+ alias: ' - human_aging'
2206
+ acc,none: 0.6681614349775785
2207
+ acc_stderr,none: 0.03160295143776679
2208
+ mmlu_management:
2209
+ alias: ' - management'
2210
+ acc,none: 0.7766990291262136
2211
+ acc_stderr,none: 0.04123553189891431
2212
+ mmlu_marketing:
2213
+ alias: ' - marketing'
2214
+ acc,none: 0.8076923076923077
2215
+ acc_stderr,none: 0.025819233256483706
2216
+ mmlu_medical_genetics:
2217
+ alias: ' - medical_genetics'
2218
+ acc,none: 0.7
2219
+ acc_stderr,none: 0.046056618647183814
2220
+ mmlu_miscellaneous:
2221
+ alias: ' - miscellaneous'
2222
+ acc,none: 0.7879948914431673
2223
+ acc_stderr,none: 0.014616099385833688
2224
+ mmlu_nutrition:
2225
+ alias: ' - nutrition'
2226
+ acc,none: 0.6503267973856209
2227
+ acc_stderr,none: 0.027305308076274695
2228
+ mmlu_professional_accounting:
2229
+ alias: ' - professional_accounting'
2230
+ acc,none: 0.46808510638297873
2231
+ acc_stderr,none: 0.02976667507587387
2232
+ mmlu_professional_medicine:
2233
+ alias: ' - professional_medicine'
2234
+ acc,none: 0.6360294117647058
2235
+ acc_stderr,none: 0.029227192460032032
2236
+ mmlu_virology:
2237
+ alias: ' - virology'
2238
+ acc,none: 0.4879518072289157
2239
+ acc_stderr,none: 0.038913644958358196
2240
+ mmlu_social_sciences:
2241
+ alias: ' - social_sciences'
2242
+ acc,none: 0.6785830354241144
2243
+ acc_stderr,none: 0.00821975248078532
2244
+ mmlu_econometrics:
2245
+ alias: ' - econometrics'
2246
+ acc,none: 0.43859649122807015
2247
+ acc_stderr,none: 0.04668000738510455
2248
+ mmlu_high_school_geography:
2249
+ alias: ' - high_school_geography'
2250
+ acc,none: 0.6868686868686869
2251
+ acc_stderr,none: 0.03304205087813652
2252
+ mmlu_high_school_government_and_politics:
2253
+ alias: ' - high_school_government_and_politics'
2254
+ acc,none: 0.8031088082901554
2255
+ acc_stderr,none: 0.028697873971860702
2256
+ mmlu_high_school_macroeconomics:
2257
+ alias: ' - high_school_macroeconomics'
2258
+ acc,none: 0.5153846153846153
2259
+ acc_stderr,none: 0.025339003010106515
2260
+ mmlu_high_school_microeconomics:
2261
+ alias: ' - high_school_microeconomics'
2262
+ acc,none: 0.6512605042016807
2263
+ acc_stderr,none: 0.030956636328566548
2264
+ mmlu_high_school_psychology:
2265
+ alias: ' - high_school_psychology'
2266
+ acc,none: 0.7669724770642202
2267
+ acc_stderr,none: 0.0181256691808615
2268
+ mmlu_human_sexuality:
2269
+ alias: ' - human_sexuality'
2270
+ acc,none: 0.7099236641221374
2271
+ acc_stderr,none: 0.03980066246467765
2272
+ mmlu_professional_psychology:
2273
+ alias: ' - professional_psychology'
2274
+ acc,none: 0.619281045751634
2275
+ acc_stderr,none: 0.019643801557924806
2276
+ mmlu_public_relations:
2277
+ alias: ' - public_relations'
2278
+ acc,none: 0.6727272727272727
2279
+ acc_stderr,none: 0.0449429086625209
2280
+ mmlu_security_studies:
2281
+ alias: ' - security_studies'
2282
+ acc,none: 0.726530612244898
2283
+ acc_stderr,none: 0.028535560337128445
2284
+ mmlu_sociology:
2285
+ alias: ' - sociology'
2286
+ acc,none: 0.8208955223880597
2287
+ acc_stderr,none: 0.027113286753111837
2288
+ mmlu_us_foreign_policy:
2289
+ alias: ' - us_foreign_policy'
2290
+ acc,none: 0.84
2291
+ acc_stderr,none: 0.03684529491774708
2292
+ mmlu_stem:
2293
+ alias: ' - stem'
2294
+ acc,none: 0.4874722486520774
2295
+ acc_stderr,none: 0.008583025767956746
2296
+ mmlu_abstract_algebra:
2297
+ alias: ' - abstract_algebra'
2298
+ acc,none: 0.31
2299
+ acc_stderr,none: 0.04648231987117316
2300
+ mmlu_anatomy:
2301
+ alias: ' - anatomy'
2302
+ acc,none: 0.5481481481481482
2303
+ acc_stderr,none: 0.04299268905480864
2304
+ mmlu_astronomy:
2305
+ alias: ' - astronomy'
2306
+ acc,none: 0.6118421052631579
2307
+ acc_stderr,none: 0.03965842097512744
2308
+ mmlu_college_biology:
2309
+ alias: ' - college_biology'
2310
+ acc,none: 0.7569444444444444
2311
+ acc_stderr,none: 0.03586879280080341
2312
+ mmlu_college_chemistry:
2313
+ alias: ' - college_chemistry'
2314
+ acc,none: 0.38
2315
+ acc_stderr,none: 0.04878317312145633
2316
+ mmlu_college_computer_science:
2317
+ alias: ' - college_computer_science'
2318
+ acc,none: 0.4
2319
+ acc_stderr,none: 0.049236596391733084
2320
+ mmlu_college_mathematics:
2321
+ alias: ' - college_mathematics'
2322
+ acc,none: 0.35
2323
+ acc_stderr,none: 0.04793724854411019
2324
+ mmlu_college_physics:
2325
+ alias: ' - college_physics'
2326
+ acc,none: 0.37254901960784315
2327
+ acc_stderr,none: 0.04810840148082633
2328
+ mmlu_computer_security:
2329
+ alias: ' - computer_security'
2330
+ acc,none: 0.67
2331
+ acc_stderr,none: 0.04725815626252609
2332
+ mmlu_conceptual_physics:
2333
+ alias: ' - conceptual_physics'
2334
+ acc,none: 0.5234042553191489
2335
+ acc_stderr,none: 0.032650194750335815
2336
+ mmlu_electrical_engineering:
2337
+ alias: ' - electrical_engineering'
2338
+ acc,none: 0.5172413793103449
2339
+ acc_stderr,none: 0.04164188720169375
2340
+ mmlu_elementary_mathematics:
2341
+ alias: ' - elementary_mathematics'
2342
+ acc,none: 0.373015873015873
2343
+ acc_stderr,none: 0.02490699045899257
2344
+ mmlu_high_school_biology:
2345
+ alias: ' - high_school_biology'
2346
+ acc,none: 0.7225806451612903
2347
+ acc_stderr,none: 0.02547019683590005
2348
+ mmlu_high_school_chemistry:
2349
+ alias: ' - high_school_chemistry'
2350
+ acc,none: 0.4630541871921182
2351
+ acc_stderr,none: 0.035083705204426656
2352
+ mmlu_high_school_computer_science:
2353
+ alias: ' - high_school_computer_science'
2354
+ acc,none: 0.62
2355
+ acc_stderr,none: 0.048783173121456316
2356
+ mmlu_high_school_mathematics:
2357
+ alias: ' - high_school_mathematics'
2358
+ acc,none: 0.32222222222222224
2359
+ acc_stderr,none: 0.028493465091028593
2360
+ mmlu_high_school_physics:
2361
+ alias: ' - high_school_physics'
2362
+ acc,none: 0.3576158940397351
2363
+ acc_stderr,none: 0.03913453431177258
2364
+ mmlu_high_school_statistics:
2365
+ alias: ' - high_school_statistics'
2366
+ acc,none: 0.4398148148148148
2367
+ acc_stderr,none: 0.033851779760448106
2368
+ mmlu_machine_learning:
2369
+ alias: ' - machine_learning'
2370
+ acc,none: 0.5089285714285714
2371
+ acc_stderr,none: 0.04745033255489123
2372
+ groups:
2373
+ mmlu:
2374
+ acc,none: 0.5817547357926222
2375
+ acc_stderr,none: 0.0039373066351597085
2376
+ alias: mmlu
2377
+ mmlu_humanities:
2378
+ alias: ' - humanities'
2379
+ acc,none: 0.5247608926673751
2380
+ acc_stderr,none: 0.006839745323517898
2381
+ mmlu_other:
2382
+ alias: ' - other'
2383
+ acc,none: 0.6678467975539105
2384
+ acc_stderr,none: 0.008199669520892388
2385
+ mmlu_social_sciences:
2386
+ alias: ' - social_sciences'
2387
+ acc,none: 0.6785830354241144
2388
+ acc_stderr,none: 0.00821975248078532
2389
+ mmlu_stem:
2390
+ alias: ' - stem'
2391
+ acc,none: 0.4874722486520774
2392
+ acc_stderr,none: 0.008583025767956746
2393
+ group_subtasks:
2394
+ mmlu_stem:
2395
+ - mmlu_college_computer_science
2396
+ - mmlu_college_chemistry
2397
+ - mmlu_college_biology
2398
+ - mmlu_astronomy
2399
+ - mmlu_anatomy
2400
+ - mmlu_abstract_algebra
2401
+ - mmlu_machine_learning
2402
+ - mmlu_high_school_statistics
2403
+ - mmlu_high_school_physics
2404
+ - mmlu_high_school_mathematics
2405
+ - mmlu_high_school_computer_science
2406
+ - mmlu_high_school_chemistry
2407
+ - mmlu_high_school_biology
2408
+ - mmlu_elementary_mathematics
2409
+ - mmlu_electrical_engineering
2410
+ - mmlu_conceptual_physics
2411
+ - mmlu_computer_security
2412
+ - mmlu_college_physics
2413
+ - mmlu_college_mathematics
2414
+ mmlu_other:
2415
+ - mmlu_clinical_knowledge
2416
+ - mmlu_business_ethics
2417
+ - mmlu_virology
2418
+ - mmlu_professional_medicine
2419
+ - mmlu_professional_accounting
2420
+ - mmlu_nutrition
2421
+ - mmlu_miscellaneous
2422
+ - mmlu_medical_genetics
2423
+ - mmlu_marketing
2424
+ - mmlu_management
2425
+ - mmlu_human_aging
2426
+ - mmlu_global_facts
2427
+ - mmlu_college_medicine
2428
+ mmlu_social_sciences:
2429
+ - mmlu_us_foreign_policy
2430
+ - mmlu_sociology
2431
+ - mmlu_security_studies
2432
+ - mmlu_public_relations
2433
+ - mmlu_professional_psychology
2434
+ - mmlu_human_sexuality
2435
+ - mmlu_high_school_psychology
2436
+ - mmlu_high_school_microeconomics
2437
+ - mmlu_high_school_macroeconomics
2438
+ - mmlu_high_school_government_and_politics
2439
+ - mmlu_high_school_geography
2440
+ - mmlu_econometrics
2441
+ mmlu_humanities:
2442
+ - mmlu_world_religions
2443
+ - mmlu_professional_law
2444
+ - mmlu_prehistory
2445
+ - mmlu_philosophy
2446
+ - mmlu_moral_scenarios
2447
+ - mmlu_moral_disputes
2448
+ - mmlu_logical_fallacies
2449
+ - mmlu_jurisprudence
2450
+ - mmlu_international_law
2451
+ - mmlu_high_school_world_history
2452
+ - mmlu_high_school_us_history
2453
+ - mmlu_high_school_european_history
2454
+ - mmlu_formal_logic
2455
+ mmlu:
2456
+ - mmlu_humanities
2457
+ - mmlu_social_sciences
2458
+ - mmlu_other
2459
+ - mmlu_stem
2460
+ configs:
2461
+ mmlu_abstract_algebra:
2462
+ task: mmlu_abstract_algebra
2463
+ task_alias: abstract_algebra
2464
+ group: mmlu_stem
2465
+ group_alias: stem
2466
+ dataset_path: hails/mmlu_no_train
2467
+ dataset_name: abstract_algebra
2468
+ test_split: test
2469
+ fewshot_split: dev
2470
+ doc_to_text: '{{question.strip()}}
2471
+
2472
+ A. {{choices[0]}}
2473
+
2474
+ B. {{choices[1]}}
2475
+
2476
+ C. {{choices[2]}}
2477
+
2478
+ D. {{choices[3]}}
2479
+
2480
+ Answer:'
2481
+ doc_to_target: answer
2482
+ doc_to_choice:
2483
+ - A
2484
+ - B
2485
+ - C
2486
+ - D
2487
+ description: 'The following are multiple choice questions (with answers)
2488
+ about abstract algebra.
2489
+
2490
+
2491
+ '
2492
+ target_delimiter: ' '
2493
+ fewshot_delimiter: '
2494
+
2495
+
2496
+ '
2497
+ fewshot_config:
2498
+ sampler: first_n
2499
+ metric_list:
2500
+ - metric: acc
2501
+ aggregation: mean
2502
+ higher_is_better: true
2503
+ output_type: multiple_choice
2504
+ repeats: 1
2505
+ should_decontaminate: false
2506
+ metadata:
2507
+ version: 0.0
2508
+ mmlu_anatomy:
2509
+ task: mmlu_anatomy
2510
+ task_alias: anatomy
2511
+ group: mmlu_stem
2512
+ group_alias: stem
2513
+ dataset_path: hails/mmlu_no_train
2514
+ dataset_name: anatomy
2515
+ test_split: test
2516
+ fewshot_split: dev
2517
+ doc_to_text: '{{question.strip()}}
2518
+
2519
+ A. {{choices[0]}}
2520
+
2521
+ B. {{choices[1]}}
2522
+
2523
+ C. {{choices[2]}}
2524
+
2525
+ D. {{choices[3]}}
2526
+
2527
+ Answer:'
2528
+ doc_to_target: answer
2529
+ doc_to_choice:
2530
+ - A
2531
+ - B
2532
+ - C
2533
+ - D
2534
+ description: 'The following are multiple choice questions (with answers)
2535
+ about anatomy.
2536
+
2537
+
2538
+ '
2539
+ target_delimiter: ' '
2540
+ fewshot_delimiter: '
2541
+
2542
+
2543
+ '
2544
+ fewshot_config:
2545
+ sampler: first_n
2546
+ metric_list:
2547
+ - metric: acc
2548
+ aggregation: mean
2549
+ higher_is_better: true
2550
+ output_type: multiple_choice
2551
+ repeats: 1
2552
+ should_decontaminate: false
2553
+ metadata:
2554
+ version: 0.0
2555
+ mmlu_astronomy:
2556
+ task: mmlu_astronomy
2557
+ task_alias: astronomy
2558
+ group: mmlu_stem
2559
+ group_alias: stem
2560
+ dataset_path: hails/mmlu_no_train
2561
+ dataset_name: astronomy
2562
+ test_split: test
2563
+ fewshot_split: dev
2564
+ doc_to_text: '{{question.strip()}}
2565
+
2566
+ A. {{choices[0]}}
2567
+
2568
+ B. {{choices[1]}}
2569
+
2570
+ C. {{choices[2]}}
2571
+
2572
+ D. {{choices[3]}}
2573
+
2574
+ Answer:'
2575
+ doc_to_target: answer
2576
+ doc_to_choice:
2577
+ - A
2578
+ - B
2579
+ - C
2580
+ - D
2581
+ description: 'The following are multiple choice questions (with answers)
2582
+ about astronomy.
2583
+
2584
+
2585
+ '
2586
+ target_delimiter: ' '
2587
+ fewshot_delimiter: '
2588
+
2589
+
2590
+ '
2591
+ fewshot_config:
2592
+ sampler: first_n
2593
+ metric_list:
2594
+ - metric: acc
2595
+ aggregation: mean
2596
+ higher_is_better: true
2597
+ output_type: multiple_choice
2598
+ repeats: 1
2599
+ should_decontaminate: false
2600
+ metadata:
2601
+ version: 0.0
2602
+ mmlu_business_ethics:
2603
+ task: mmlu_business_ethics
2604
+ task_alias: business_ethics
2605
+ group: mmlu_other
2606
+ group_alias: other
2607
+ dataset_path: hails/mmlu_no_train
2608
+ dataset_name: business_ethics
2609
+ test_split: test
2610
+ fewshot_split: dev
2611
+ doc_to_text: '{{question.strip()}}
2612
+
2613
+ A. {{choices[0]}}
2614
+
2615
+ B. {{choices[1]}}
2616
+
2617
+ C. {{choices[2]}}
2618
+
2619
+ D. {{choices[3]}}
2620
+
2621
+ Answer:'
2622
+ doc_to_target: answer
2623
+ doc_to_choice:
2624
+ - A
2625
+ - B
2626
+ - C
2627
+ - D
2628
+ description: 'The following are multiple choice questions (with answers)
2629
+ about business ethics.
2630
+
2631
+
2632
+ '
2633
+ target_delimiter: ' '
2634
+ fewshot_delimiter: '
2635
+
2636
+
2637
+ '
2638
+ fewshot_config:
2639
+ sampler: first_n
2640
+ metric_list:
2641
+ - metric: acc
2642
+ aggregation: mean
2643
+ higher_is_better: true
2644
+ output_type: multiple_choice
2645
+ repeats: 1
2646
+ should_decontaminate: false
2647
+ metadata:
2648
+ version: 0.0
2649
+ mmlu_clinical_knowledge:
2650
+ task: mmlu_clinical_knowledge
2651
+ task_alias: clinical_knowledge
2652
+ group: mmlu_other
2653
+ group_alias: other
2654
+ dataset_path: hails/mmlu_no_train
2655
+ dataset_name: clinical_knowledge
2656
+ test_split: test
2657
+ fewshot_split: dev
2658
+ doc_to_text: '{{question.strip()}}
2659
+
2660
+ A. {{choices[0]}}
2661
+
2662
+ B. {{choices[1]}}
2663
+
2664
+ C. {{choices[2]}}
2665
+
2666
+ D. {{choices[3]}}
2667
+
2668
+ Answer:'
2669
+ doc_to_target: answer
2670
+ doc_to_choice:
2671
+ - A
2672
+ - B
2673
+ - C
2674
+ - D
2675
+ description: 'The following are multiple choice questions (with answers)
2676
+ about clinical knowledge.
2677
+
2678
+
2679
+ '
2680
+ target_delimiter: ' '
2681
+ fewshot_delimiter: '
2682
+
2683
+
2684
+ '
2685
+ fewshot_config:
2686
+ sampler: first_n
2687
+ metric_list:
2688
+ - metric: acc
2689
+ aggregation: mean
2690
+ higher_is_better: true
2691
+ output_type: multiple_choice
2692
+ repeats: 1
2693
+ should_decontaminate: false
2694
+ metadata:
2695
+ version: 0.0
2696
+ mmlu_college_biology:
2697
+ task: mmlu_college_biology
2698
+ task_alias: college_biology
2699
+ group: mmlu_stem
2700
+ group_alias: stem
2701
+ dataset_path: hails/mmlu_no_train
2702
+ dataset_name: college_biology
2703
+ test_split: test
2704
+ fewshot_split: dev
2705
+ doc_to_text: '{{question.strip()}}
2706
+
2707
+ A. {{choices[0]}}
2708
+
2709
+ B. {{choices[1]}}
2710
+
2711
+ C. {{choices[2]}}
2712
+
2713
+ D. {{choices[3]}}
2714
+
2715
+ Answer:'
2716
+ doc_to_target: answer
2717
+ doc_to_choice:
2718
+ - A
2719
+ - B
2720
+ - C
2721
+ - D
2722
+ description: 'The following are multiple choice questions (with answers)
2723
+ about college biology.
2724
+
2725
+
2726
+ '
2727
+ target_delimiter: ' '
2728
+ fewshot_delimiter: '
2729
+
2730
+
2731
+ '
2732
+ fewshot_config:
2733
+ sampler: first_n
2734
+ metric_list:
2735
+ - metric: acc
2736
+ aggregation: mean
2737
+ higher_is_better: true
2738
+ output_type: multiple_choice
2739
+ repeats: 1
2740
+ should_decontaminate: false
2741
+ metadata:
2742
+ version: 0.0
2743
+ mmlu_college_chemistry:
2744
+ task: mmlu_college_chemistry
2745
+ task_alias: college_chemistry
2746
+ group: mmlu_stem
2747
+ group_alias: stem
2748
+ dataset_path: hails/mmlu_no_train
2749
+ dataset_name: college_chemistry
2750
+ test_split: test
2751
+ fewshot_split: dev
2752
+ doc_to_text: '{{question.strip()}}
2753
+
2754
+ A. {{choices[0]}}
2755
+
2756
+ B. {{choices[1]}}
2757
+
2758
+ C. {{choices[2]}}
2759
+
2760
+ D. {{choices[3]}}
2761
+
2762
+ Answer:'
2763
+ doc_to_target: answer
2764
+ doc_to_choice:
2765
+ - A
2766
+ - B
2767
+ - C
2768
+ - D
2769
+ description: 'The following are multiple choice questions (with answers)
2770
+ about college chemistry.
2771
+
2772
+
2773
+ '
2774
+ target_delimiter: ' '
2775
+ fewshot_delimiter: '
2776
+
2777
+
2778
+ '
2779
+ fewshot_config:
2780
+ sampler: first_n
2781
+ metric_list:
2782
+ - metric: acc
2783
+ aggregation: mean
2784
+ higher_is_better: true
2785
+ output_type: multiple_choice
2786
+ repeats: 1
2787
+ should_decontaminate: false
2788
+ metadata:
2789
+ version: 0.0
2790
+ mmlu_college_computer_science:
2791
+ task: mmlu_college_computer_science
2792
+ task_alias: college_computer_science
2793
+ group: mmlu_stem
2794
+ group_alias: stem
2795
+ dataset_path: hails/mmlu_no_train
2796
+ dataset_name: college_computer_science
2797
+ test_split: test
2798
+ fewshot_split: dev
2799
+ doc_to_text: '{{question.strip()}}
2800
+
2801
+ A. {{choices[0]}}
2802
+
2803
+ B. {{choices[1]}}
2804
+
2805
+ C. {{choices[2]}}
2806
+
2807
+ D. {{choices[3]}}
2808
+
2809
+ Answer:'
2810
+ doc_to_target: answer
2811
+ doc_to_choice:
2812
+ - A
2813
+ - B
2814
+ - C
2815
+ - D
2816
+ description: 'The following are multiple choice questions (with answers)
2817
+ about college computer science.
2818
+
2819
+
2820
+ '
2821
+ target_delimiter: ' '
2822
+ fewshot_delimiter: '
2823
+
2824
+
2825
+ '
2826
+ fewshot_config:
2827
+ sampler: first_n
2828
+ metric_list:
2829
+ - metric: acc
2830
+ aggregation: mean
2831
+ higher_is_better: true
2832
+ output_type: multiple_choice
2833
+ repeats: 1
2834
+ should_decontaminate: false
2835
+ metadata:
2836
+ version: 0.0
2837
+ mmlu_college_mathematics:
2838
+ task: mmlu_college_mathematics
2839
+ task_alias: college_mathematics
2840
+ group: mmlu_stem
2841
+ group_alias: stem
2842
+ dataset_path: hails/mmlu_no_train
2843
+ dataset_name: college_mathematics
2844
+ test_split: test
2845
+ fewshot_split: dev
2846
+ doc_to_text: '{{question.strip()}}
2847
+
2848
+ A. {{choices[0]}}
2849
+
2850
+ B. {{choices[1]}}
2851
+
2852
+ C. {{choices[2]}}
2853
+
2854
+ D. {{choices[3]}}
2855
+
2856
+ Answer:'
2857
+ doc_to_target: answer
2858
+ doc_to_choice:
2859
+ - A
2860
+ - B
2861
+ - C
2862
+ - D
2863
+ description: 'The following are multiple choice questions (with answers)
2864
+ about college mathematics.
2865
+
2866
+
2867
+ '
2868
+ target_delimiter: ' '
2869
+ fewshot_delimiter: '
2870
+
2871
+
2872
+ '
2873
+ fewshot_config:
2874
+ sampler: first_n
2875
+ metric_list:
2876
+ - metric: acc
2877
+ aggregation: mean
2878
+ higher_is_better: true
2879
+ output_type: multiple_choice
2880
+ repeats: 1
2881
+ should_decontaminate: false
2882
+ metadata:
2883
+ version: 0.0
2884
+ mmlu_college_medicine:
2885
+ task: mmlu_college_medicine
2886
+ task_alias: college_medicine
2887
+ group: mmlu_other
2888
+ group_alias: other
2889
+ dataset_path: hails/mmlu_no_train
2890
+ dataset_name: college_medicine
2891
+ test_split: test
2892
+ fewshot_split: dev
2893
+ doc_to_text: '{{question.strip()}}
2894
+
2895
+ A. {{choices[0]}}
2896
+
2897
+ B. {{choices[1]}}
2898
+
2899
+ C. {{choices[2]}}
2900
+
2901
+ D. {{choices[3]}}
2902
+
2903
+ Answer:'
2904
+ doc_to_target: answer
2905
+ doc_to_choice:
2906
+ - A
2907
+ - B
2908
+ - C
2909
+ - D
2910
+ description: 'The following are multiple choice questions (with answers)
2911
+ about college medicine.
2912
+
2913
+
2914
+ '
2915
+ target_delimiter: ' '
2916
+ fewshot_delimiter: '
2917
+
2918
+
2919
+ '
2920
+ fewshot_config:
2921
+ sampler: first_n
2922
+ metric_list:
2923
+ - metric: acc
2924
+ aggregation: mean
2925
+ higher_is_better: true
2926
+ output_type: multiple_choice
2927
+ repeats: 1
2928
+ should_decontaminate: false
2929
+ metadata:
2930
+ version: 0.0
2931
+ mmlu_college_physics:
2932
+ task: mmlu_college_physics
2933
+ task_alias: college_physics
2934
+ group: mmlu_stem
2935
+ group_alias: stem
2936
+ dataset_path: hails/mmlu_no_train
2937
+ dataset_name: college_physics
2938
+ test_split: test
2939
+ fewshot_split: dev
2940
+ doc_to_text: '{{question.strip()}}
2941
+
2942
+ A. {{choices[0]}}
2943
+
2944
+ B. {{choices[1]}}
2945
+
2946
+ C. {{choices[2]}}
2947
+
2948
+ D. {{choices[3]}}
2949
+
2950
+ Answer:'
2951
+ doc_to_target: answer
2952
+ doc_to_choice:
2953
+ - A
2954
+ - B
2955
+ - C
2956
+ - D
2957
+ description: 'The following are multiple choice questions (with answers)
2958
+ about college physics.
2959
+
2960
+
2961
+ '
2962
+ target_delimiter: ' '
2963
+ fewshot_delimiter: '
2964
+
2965
+
2966
+ '
2967
+ fewshot_config:
2968
+ sampler: first_n
2969
+ metric_list:
2970
+ - metric: acc
2971
+ aggregation: mean
2972
+ higher_is_better: true
2973
+ output_type: multiple_choice
2974
+ repeats: 1
2975
+ should_decontaminate: false
2976
+ metadata:
2977
+ version: 0.0
2978
+ mmlu_computer_security:
2979
+ task: mmlu_computer_security
2980
+ task_alias: computer_security
2981
+ group: mmlu_stem
2982
+ group_alias: stem
2983
+ dataset_path: hails/mmlu_no_train
2984
+ dataset_name: computer_security
2985
+ test_split: test
2986
+ fewshot_split: dev
2987
+ doc_to_text: '{{question.strip()}}
2988
+
2989
+ A. {{choices[0]}}
2990
+
2991
+ B. {{choices[1]}}
2992
+
2993
+ C. {{choices[2]}}
2994
+
2995
+ D. {{choices[3]}}
2996
+
2997
+ Answer:'
2998
+ doc_to_target: answer
2999
+ doc_to_choice:
3000
+ - A
3001
+ - B
3002
+ - C
3003
+ - D
3004
+ description: 'The following are multiple choice questions (with answers)
3005
+ about computer security.
3006
+
3007
+
3008
+ '
3009
+ target_delimiter: ' '
3010
+ fewshot_delimiter: '
3011
+
3012
+
3013
+ '
3014
+ fewshot_config:
3015
+ sampler: first_n
3016
+ metric_list:
3017
+ - metric: acc
3018
+ aggregation: mean
3019
+ higher_is_better: true
3020
+ output_type: multiple_choice
3021
+ repeats: 1
3022
+ should_decontaminate: false
3023
+ metadata:
3024
+ version: 0.0
3025
+ mmlu_conceptual_physics:
3026
+ task: mmlu_conceptual_physics
3027
+ task_alias: conceptual_physics
3028
+ group: mmlu_stem
3029
+ group_alias: stem
3030
+ dataset_path: hails/mmlu_no_train
3031
+ dataset_name: conceptual_physics
3032
+ test_split: test
3033
+ fewshot_split: dev
3034
+ doc_to_text: '{{question.strip()}}
3035
+
3036
+ A. {{choices[0]}}
3037
+
3038
+ B. {{choices[1]}}
3039
+
3040
+ C. {{choices[2]}}
3041
+
3042
+ D. {{choices[3]}}
3043
+
3044
+ Answer:'
3045
+ doc_to_target: answer
3046
+ doc_to_choice:
3047
+ - A
3048
+ - B
3049
+ - C
3050
+ - D
3051
+ description: 'The following are multiple choice questions (with answers)
3052
+ about conceptual physics.
3053
+
3054
+
3055
+ '
3056
+ target_delimiter: ' '
3057
+ fewshot_delimiter: '
3058
+
3059
+
3060
+ '
3061
+ fewshot_config:
3062
+ sampler: first_n
3063
+ metric_list:
3064
+ - metric: acc
3065
+ aggregation: mean
3066
+ higher_is_better: true
3067
+ output_type: multiple_choice
3068
+ repeats: 1
3069
+ should_decontaminate: false
3070
+ metadata:
3071
+ version: 0.0
3072
+ mmlu_econometrics:
3073
+ task: mmlu_econometrics
3074
+ task_alias: econometrics
3075
+ group: mmlu_social_sciences
3076
+ group_alias: social_sciences
3077
+ dataset_path: hails/mmlu_no_train
3078
+ dataset_name: econometrics
3079
+ test_split: test
3080
+ fewshot_split: dev
3081
+ doc_to_text: '{{question.strip()}}
3082
+
3083
+ A. {{choices[0]}}
3084
+
3085
+ B. {{choices[1]}}
3086
+
3087
+ C. {{choices[2]}}
3088
+
3089
+ D. {{choices[3]}}
3090
+
3091
+ Answer:'
3092
+ doc_to_target: answer
3093
+ doc_to_choice:
3094
+ - A
3095
+ - B
3096
+ - C
3097
+ - D
3098
+ description: 'The following are multiple choice questions (with answers)
3099
+ about econometrics.
3100
+
3101
+
3102
+ '
3103
+ target_delimiter: ' '
3104
+ fewshot_delimiter: '
3105
+
3106
+
3107
+ '
3108
+ fewshot_config:
3109
+ sampler: first_n
3110
+ metric_list:
3111
+ - metric: acc
3112
+ aggregation: mean
3113
+ higher_is_better: true
3114
+ output_type: multiple_choice
3115
+ repeats: 1
3116
+ should_decontaminate: false
3117
+ metadata:
3118
+ version: 0.0
3119
+ mmlu_electrical_engineering:
3120
+ task: mmlu_electrical_engineering
3121
+ task_alias: electrical_engineering
3122
+ group: mmlu_stem
3123
+ group_alias: stem
3124
+ dataset_path: hails/mmlu_no_train
3125
+ dataset_name: electrical_engineering
3126
+ test_split: test
3127
+ fewshot_split: dev
3128
+ doc_to_text: '{{question.strip()}}
3129
+
3130
+ A. {{choices[0]}}
3131
+
3132
+ B. {{choices[1]}}
3133
+
3134
+ C. {{choices[2]}}
3135
+
3136
+ D. {{choices[3]}}
3137
+
3138
+ Answer:'
3139
+ doc_to_target: answer
3140
+ doc_to_choice:
3141
+ - A
3142
+ - B
3143
+ - C
3144
+ - D
3145
+ description: 'The following are multiple choice questions (with answers)
3146
+ about electrical engineering.
3147
+
3148
+
3149
+ '
3150
+ target_delimiter: ' '
3151
+ fewshot_delimiter: '
3152
+
3153
+
3154
+ '
3155
+ fewshot_config:
3156
+ sampler: first_n
3157
+ metric_list:
3158
+ - metric: acc
3159
+ aggregation: mean
3160
+ higher_is_better: true
3161
+ output_type: multiple_choice
3162
+ repeats: 1
3163
+ should_decontaminate: false
3164
+ metadata:
3165
+ version: 0.0
3166
+ mmlu_elementary_mathematics:
3167
+ task: mmlu_elementary_mathematics
3168
+ task_alias: elementary_mathematics
3169
+ group: mmlu_stem
3170
+ group_alias: stem
3171
+ dataset_path: hails/mmlu_no_train
3172
+ dataset_name: elementary_mathematics
3173
+ test_split: test
3174
+ fewshot_split: dev
3175
+ doc_to_text: '{{question.strip()}}
3176
+
3177
+ A. {{choices[0]}}
3178
+
3179
+ B. {{choices[1]}}
3180
+
3181
+ C. {{choices[2]}}
3182
+
3183
+ D. {{choices[3]}}
3184
+
3185
+ Answer:'
3186
+ doc_to_target: answer
3187
+ doc_to_choice:
3188
+ - A
3189
+ - B
3190
+ - C
3191
+ - D
3192
+ description: 'The following are multiple choice questions (with answers)
3193
+ about elementary mathematics.
3194
+
3195
+
3196
+ '
3197
+ target_delimiter: ' '
3198
+ fewshot_delimiter: '
3199
+
3200
+
3201
+ '
3202
+ fewshot_config:
3203
+ sampler: first_n
3204
+ metric_list:
3205
+ - metric: acc
3206
+ aggregation: mean
3207
+ higher_is_better: true
3208
+ output_type: multiple_choice
3209
+ repeats: 1
3210
+ should_decontaminate: false
3211
+ metadata:
3212
+ version: 0.0
3213
+ mmlu_formal_logic:
3214
+ task: mmlu_formal_logic
3215
+ task_alias: formal_logic
3216
+ group: mmlu_humanities
3217
+ group_alias: humanities
3218
+ dataset_path: hails/mmlu_no_train
3219
+ dataset_name: formal_logic
3220
+ test_split: test
3221
+ fewshot_split: dev
3222
+ doc_to_text: '{{question.strip()}}
3223
+
3224
+ A. {{choices[0]}}
3225
+
3226
+ B. {{choices[1]}}
3227
+
3228
+ C. {{choices[2]}}
3229
+
3230
+ D. {{choices[3]}}
3231
+
3232
+ Answer:'
3233
+ doc_to_target: answer
3234
+ doc_to_choice:
3235
+ - A
3236
+ - B
3237
+ - C
3238
+ - D
3239
+ description: 'The following are multiple choice questions (with answers)
3240
+ about formal logic.
3241
+
3242
+
3243
+ '
3244
+ target_delimiter: ' '
3245
+ fewshot_delimiter: '
3246
+
3247
+
3248
+ '
3249
+ fewshot_config:
3250
+ sampler: first_n
3251
+ metric_list:
3252
+ - metric: acc
3253
+ aggregation: mean
3254
+ higher_is_better: true
3255
+ output_type: multiple_choice
3256
+ repeats: 1
3257
+ should_decontaminate: false
3258
+ metadata:
3259
+ version: 0.0
3260
+ mmlu_global_facts:
3261
+ task: mmlu_global_facts
3262
+ task_alias: global_facts
3263
+ group: mmlu_other
3264
+ group_alias: other
3265
+ dataset_path: hails/mmlu_no_train
3266
+ dataset_name: global_facts
3267
+ test_split: test
3268
+ fewshot_split: dev
3269
+ doc_to_text: '{{question.strip()}}
3270
+
3271
+ A. {{choices[0]}}
3272
+
3273
+ B. {{choices[1]}}
3274
+
3275
+ C. {{choices[2]}}
3276
+
3277
+ D. {{choices[3]}}
3278
+
3279
+ Answer:'
3280
+ doc_to_target: answer
3281
+ doc_to_choice:
3282
+ - A
3283
+ - B
3284
+ - C
3285
+ - D
3286
+ description: 'The following are multiple choice questions (with answers)
3287
+ about global facts.
3288
+
3289
+
3290
+ '
3291
+ target_delimiter: ' '
3292
+ fewshot_delimiter: '
3293
+
3294
+
3295
+ '
3296
+ fewshot_config:
3297
+ sampler: first_n
3298
+ metric_list:
3299
+ - metric: acc
3300
+ aggregation: mean
3301
+ higher_is_better: true
3302
+ output_type: multiple_choice
3303
+ repeats: 1
3304
+ should_decontaminate: false
3305
+ metadata:
3306
+ version: 0.0
3307
+ mmlu_high_school_biology:
3308
+ task: mmlu_high_school_biology
3309
+ task_alias: high_school_biology
3310
+ group: mmlu_stem
3311
+ group_alias: stem
3312
+ dataset_path: hails/mmlu_no_train
3313
+ dataset_name: high_school_biology
3314
+ test_split: test
3315
+ fewshot_split: dev
3316
+ doc_to_text: '{{question.strip()}}
3317
+
3318
+ A. {{choices[0]}}
3319
+
3320
+ B. {{choices[1]}}
3321
+
3322
+ C. {{choices[2]}}
3323
+
3324
+ D. {{choices[3]}}
3325
+
3326
+ Answer:'
3327
+ doc_to_target: answer
3328
+ doc_to_choice:
3329
+ - A
3330
+ - B
3331
+ - C
3332
+ - D
3333
+ description: 'The following are multiple choice questions (with answers)
3334
+ about high school biology.
3335
+
3336
+
3337
+ '
3338
+ target_delimiter: ' '
3339
+ fewshot_delimiter: '
3340
+
3341
+
3342
+ '
3343
+ fewshot_config:
3344
+ sampler: first_n
3345
+ metric_list:
3346
+ - metric: acc
3347
+ aggregation: mean
3348
+ higher_is_better: true
3349
+ output_type: multiple_choice
3350
+ repeats: 1
3351
+ should_decontaminate: false
3352
+ metadata:
3353
+ version: 0.0
3354
+ mmlu_high_school_chemistry:
3355
+ task: mmlu_high_school_chemistry
3356
+ task_alias: high_school_chemistry
3357
+ group: mmlu_stem
3358
+ group_alias: stem
3359
+ dataset_path: hails/mmlu_no_train
3360
+ dataset_name: high_school_chemistry
3361
+ test_split: test
3362
+ fewshot_split: dev
3363
+ doc_to_text: '{{question.strip()}}
3364
+
3365
+ A. {{choices[0]}}
3366
+
3367
+ B. {{choices[1]}}
3368
+
3369
+ C. {{choices[2]}}
3370
+
3371
+ D. {{choices[3]}}
3372
+
3373
+ Answer:'
3374
+ doc_to_target: answer
3375
+ doc_to_choice:
3376
+ - A
3377
+ - B
3378
+ - C
3379
+ - D
3380
+ description: 'The following are multiple choice questions (with answers)
3381
+ about high school chemistry.
3382
+
3383
+
3384
+ '
3385
+ target_delimiter: ' '
3386
+ fewshot_delimiter: '
3387
+
3388
+
3389
+ '
3390
+ fewshot_config:
3391
+ sampler: first_n
3392
+ metric_list:
3393
+ - metric: acc
3394
+ aggregation: mean
3395
+ higher_is_better: true
3396
+ output_type: multiple_choice
3397
+ repeats: 1
3398
+ should_decontaminate: false
3399
+ metadata:
3400
+ version: 0.0
3401
+ mmlu_high_school_computer_science:
3402
+ task: mmlu_high_school_computer_science
3403
+ task_alias: high_school_computer_science
3404
+ group: mmlu_stem
3405
+ group_alias: stem
3406
+ dataset_path: hails/mmlu_no_train
3407
+ dataset_name: high_school_computer_science
3408
+ test_split: test
3409
+ fewshot_split: dev
3410
+ doc_to_text: '{{question.strip()}}
3411
+
3412
+ A. {{choices[0]}}
3413
+
3414
+ B. {{choices[1]}}
3415
+
3416
+ C. {{choices[2]}}
3417
+
3418
+ D. {{choices[3]}}
3419
+
3420
+ Answer:'
3421
+ doc_to_target: answer
3422
+ doc_to_choice:
3423
+ - A
3424
+ - B
3425
+ - C
3426
+ - D
3427
+ description: 'The following are multiple choice questions (with answers)
3428
+ about high school computer science.
3429
+
3430
+
3431
+ '
3432
+ target_delimiter: ' '
3433
+ fewshot_delimiter: '
3434
+
3435
+
3436
+ '
3437
+ fewshot_config:
3438
+ sampler: first_n
3439
+ metric_list:
3440
+ - metric: acc
3441
+ aggregation: mean
3442
+ higher_is_better: true
3443
+ output_type: multiple_choice
3444
+ repeats: 1
3445
+ should_decontaminate: false
3446
+ metadata:
3447
+ version: 0.0
3448
+ mmlu_high_school_european_history:
3449
+ task: mmlu_high_school_european_history
3450
+ task_alias: high_school_european_history
3451
+ group: mmlu_humanities
3452
+ group_alias: humanities
3453
+ dataset_path: hails/mmlu_no_train
3454
+ dataset_name: high_school_european_history
3455
+ test_split: test
3456
+ fewshot_split: dev
3457
+ doc_to_text: '{{question.strip()}}
3458
+
3459
+ A. {{choices[0]}}
3460
+
3461
+ B. {{choices[1]}}
3462
+
3463
+ C. {{choices[2]}}
3464
+
3465
+ D. {{choices[3]}}
3466
+
3467
+ Answer:'
3468
+ doc_to_target: answer
3469
+ doc_to_choice:
3470
+ - A
3471
+ - B
3472
+ - C
3473
+ - D
3474
+ description: 'The following are multiple choice questions (with answers)
3475
+ about high school european history.
3476
+
3477
+
3478
+ '
3479
+ target_delimiter: ' '
3480
+ fewshot_delimiter: '
3481
+
3482
+
3483
+ '
3484
+ fewshot_config:
3485
+ sampler: first_n
3486
+ metric_list:
3487
+ - metric: acc
3488
+ aggregation: mean
3489
+ higher_is_better: true
3490
+ output_type: multiple_choice
3491
+ repeats: 1
3492
+ should_decontaminate: false
3493
+ metadata:
3494
+ version: 0.0
3495
+ mmlu_high_school_geography:
3496
+ task: mmlu_high_school_geography
3497
+ task_alias: high_school_geography
3498
+ group: mmlu_social_sciences
3499
+ group_alias: social_sciences
3500
+ dataset_path: hails/mmlu_no_train
3501
+ dataset_name: high_school_geography
3502
+ test_split: test
3503
+ fewshot_split: dev
3504
+ doc_to_text: '{{question.strip()}}
3505
+
3506
+ A. {{choices[0]}}
3507
+
3508
+ B. {{choices[1]}}
3509
+
3510
+ C. {{choices[2]}}
3511
+
3512
+ D. {{choices[3]}}
3513
+
3514
+ Answer:'
3515
+ doc_to_target: answer
3516
+ doc_to_choice:
3517
+ - A
3518
+ - B
3519
+ - C
3520
+ - D
3521
+ description: 'The following are multiple choice questions (with answers)
3522
+ about high school geography.
3523
+
3524
+
3525
+ '
3526
+ target_delimiter: ' '
3527
+ fewshot_delimiter: '
3528
+
3529
+
3530
+ '
3531
+ fewshot_config:
3532
+ sampler: first_n
3533
+ metric_list:
3534
+ - metric: acc
3535
+ aggregation: mean
3536
+ higher_is_better: true
3537
+ output_type: multiple_choice
3538
+ repeats: 1
3539
+ should_decontaminate: false
3540
+ metadata:
3541
+ version: 0.0
3542
+ mmlu_high_school_government_and_politics:
3543
+ task: mmlu_high_school_government_and_politics
3544
+ task_alias: high_school_government_and_politics
3545
+ group: mmlu_social_sciences
3546
+ group_alias: social_sciences
3547
+ dataset_path: hails/mmlu_no_train
3548
+ dataset_name: high_school_government_and_politics
3549
+ test_split: test
3550
+ fewshot_split: dev
3551
+ doc_to_text: '{{question.strip()}}
3552
+
3553
+ A. {{choices[0]}}
3554
+
3555
+ B. {{choices[1]}}
3556
+
3557
+ C. {{choices[2]}}
3558
+
3559
+ D. {{choices[3]}}
3560
+
3561
+ Answer:'
3562
+ doc_to_target: answer
3563
+ doc_to_choice:
3564
+ - A
3565
+ - B
3566
+ - C
3567
+ - D
3568
+ description: 'The following are multiple choice questions (with answers)
3569
+ about high school government and politics.
3570
+
3571
+
3572
+ '
3573
+ target_delimiter: ' '
3574
+ fewshot_delimiter: '
3575
+
3576
+
3577
+ '
3578
+ fewshot_config:
3579
+ sampler: first_n
3580
+ metric_list:
3581
+ - metric: acc
3582
+ aggregation: mean
3583
+ higher_is_better: true
3584
+ output_type: multiple_choice
3585
+ repeats: 1
3586
+ should_decontaminate: false
3587
+ metadata:
3588
+ version: 0.0
3589
+ mmlu_high_school_macroeconomics:
3590
+ task: mmlu_high_school_macroeconomics
3591
+ task_alias: high_school_macroeconomics
3592
+ group: mmlu_social_sciences
3593
+ group_alias: social_sciences
3594
+ dataset_path: hails/mmlu_no_train
3595
+ dataset_name: high_school_macroeconomics
3596
+ test_split: test
3597
+ fewshot_split: dev
3598
+ doc_to_text: '{{question.strip()}}
3599
+
3600
+ A. {{choices[0]}}
3601
+
3602
+ B. {{choices[1]}}
3603
+
3604
+ C. {{choices[2]}}
3605
+
3606
+ D. {{choices[3]}}
3607
+
3608
+ Answer:'
3609
+ doc_to_target: answer
3610
+ doc_to_choice:
3611
+ - A
3612
+ - B
3613
+ - C
3614
+ - D
3615
+ description: 'The following are multiple choice questions (with answers)
3616
+ about high school macroeconomics.
3617
+
3618
+
3619
+ '
3620
+ target_delimiter: ' '
3621
+ fewshot_delimiter: '
3622
+
3623
+
3624
+ '
3625
+ fewshot_config:
3626
+ sampler: first_n
3627
+ metric_list:
3628
+ - metric: acc
3629
+ aggregation: mean
3630
+ higher_is_better: true
3631
+ output_type: multiple_choice
3632
+ repeats: 1
3633
+ should_decontaminate: false
3634
+ metadata:
3635
+ version: 0.0
3636
+ mmlu_high_school_mathematics:
3637
+ task: mmlu_high_school_mathematics
3638
+ task_alias: high_school_mathematics
3639
+ group: mmlu_stem
3640
+ group_alias: stem
3641
+ dataset_path: hails/mmlu_no_train
3642
+ dataset_name: high_school_mathematics
3643
+ test_split: test
3644
+ fewshot_split: dev
3645
+ doc_to_text: '{{question.strip()}}
3646
+
3647
+ A. {{choices[0]}}
3648
+
3649
+ B. {{choices[1]}}
3650
+
3651
+ C. {{choices[2]}}
3652
+
3653
+ D. {{choices[3]}}
3654
+
3655
+ Answer:'
3656
+ doc_to_target: answer
3657
+ doc_to_choice:
3658
+ - A
3659
+ - B
3660
+ - C
3661
+ - D
3662
+ description: 'The following are multiple choice questions (with answers)
3663
+ about high school mathematics.
3664
+
3665
+
3666
+ '
3667
+ target_delimiter: ' '
3668
+ fewshot_delimiter: '
3669
+
3670
+
3671
+ '
3672
+ fewshot_config:
3673
+ sampler: first_n
3674
+ metric_list:
3675
+ - metric: acc
3676
+ aggregation: mean
3677
+ higher_is_better: true
3678
+ output_type: multiple_choice
3679
+ repeats: 1
3680
+ should_decontaminate: false
3681
+ metadata:
3682
+ version: 0.0
3683
+ mmlu_high_school_microeconomics:
3684
+ task: mmlu_high_school_microeconomics
3685
+ task_alias: high_school_microeconomics
3686
+ group: mmlu_social_sciences
3687
+ group_alias: social_sciences
3688
+ dataset_path: hails/mmlu_no_train
3689
+ dataset_name: high_school_microeconomics
3690
+ test_split: test
3691
+ fewshot_split: dev
3692
+ doc_to_text: '{{question.strip()}}
3693
+
3694
+ A. {{choices[0]}}
3695
+
3696
+ B. {{choices[1]}}
3697
+
3698
+ C. {{choices[2]}}
3699
+
3700
+ D. {{choices[3]}}
3701
+
3702
+ Answer:'
3703
+ doc_to_target: answer
3704
+ doc_to_choice:
3705
+ - A
3706
+ - B
3707
+ - C
3708
+ - D
3709
+ description: 'The following are multiple choice questions (with answers)
3710
+ about high school microeconomics.
3711
+
3712
+
3713
+ '
3714
+ target_delimiter: ' '
3715
+ fewshot_delimiter: '
3716
+
3717
+
3718
+ '
3719
+ fewshot_config:
3720
+ sampler: first_n
3721
+ metric_list:
3722
+ - metric: acc
3723
+ aggregation: mean
3724
+ higher_is_better: true
3725
+ output_type: multiple_choice
3726
+ repeats: 1
3727
+ should_decontaminate: false
3728
+ metadata:
3729
+ version: 0.0
3730
+ mmlu_high_school_physics:
3731
+ task: mmlu_high_school_physics
3732
+ task_alias: high_school_physics
3733
+ group: mmlu_stem
3734
+ group_alias: stem
3735
+ dataset_path: hails/mmlu_no_train
3736
+ dataset_name: high_school_physics
3737
+ test_split: test
3738
+ fewshot_split: dev
3739
+ doc_to_text: '{{question.strip()}}
3740
+
3741
+ A. {{choices[0]}}
3742
+
3743
+ B. {{choices[1]}}
3744
+
3745
+ C. {{choices[2]}}
3746
+
3747
+ D. {{choices[3]}}
3748
+
3749
+ Answer:'
3750
+ doc_to_target: answer
3751
+ doc_to_choice:
3752
+ - A
3753
+ - B
3754
+ - C
3755
+ - D
3756
+ description: 'The following are multiple choice questions (with answers)
3757
+ about high school physics.
3758
+
3759
+
3760
+ '
3761
+ target_delimiter: ' '
3762
+ fewshot_delimiter: '
3763
+
3764
+
3765
+ '
3766
+ fewshot_config:
3767
+ sampler: first_n
3768
+ metric_list:
3769
+ - metric: acc
3770
+ aggregation: mean
3771
+ higher_is_better: true
3772
+ output_type: multiple_choice
3773
+ repeats: 1
3774
+ should_decontaminate: false
3775
+ metadata:
3776
+ version: 0.0
3777
+ mmlu_high_school_psychology:
3778
+ task: mmlu_high_school_psychology
3779
+ task_alias: high_school_psychology
3780
+ group: mmlu_social_sciences
3781
+ group_alias: social_sciences
3782
+ dataset_path: hails/mmlu_no_train
3783
+ dataset_name: high_school_psychology
3784
+ test_split: test
3785
+ fewshot_split: dev
3786
+ doc_to_text: '{{question.strip()}}
3787
+
3788
+ A. {{choices[0]}}
3789
+
3790
+ B. {{choices[1]}}
3791
+
3792
+ C. {{choices[2]}}
3793
+
3794
+ D. {{choices[3]}}
3795
+
3796
+ Answer:'
3797
+ doc_to_target: answer
3798
+ doc_to_choice:
3799
+ - A
3800
+ - B
3801
+ - C
3802
+ - D
3803
+ description: 'The following are multiple choice questions (with answers)
3804
+ about high school psychology.
3805
+
3806
+
3807
+ '
3808
+ target_delimiter: ' '
3809
+ fewshot_delimiter: '
3810
+
3811
+
3812
+ '
3813
+ fewshot_config:
3814
+ sampler: first_n
3815
+ metric_list:
3816
+ - metric: acc
3817
+ aggregation: mean
3818
+ higher_is_better: true
3819
+ output_type: multiple_choice
3820
+ repeats: 1
3821
+ should_decontaminate: false
3822
+ metadata:
3823
+ version: 0.0
3824
+ mmlu_high_school_statistics:
3825
+ task: mmlu_high_school_statistics
3826
+ task_alias: high_school_statistics
3827
+ group: mmlu_stem
3828
+ group_alias: stem
3829
+ dataset_path: hails/mmlu_no_train
3830
+ dataset_name: high_school_statistics
3831
+ test_split: test
3832
+ fewshot_split: dev
3833
+ doc_to_text: '{{question.strip()}}
3834
+
3835
+ A. {{choices[0]}}
3836
+
3837
+ B. {{choices[1]}}
3838
+
3839
+ C. {{choices[2]}}
3840
+
3841
+ D. {{choices[3]}}
3842
+
3843
+ Answer:'
3844
+ doc_to_target: answer
3845
+ doc_to_choice:
3846
+ - A
3847
+ - B
3848
+ - C
3849
+ - D
3850
+ description: 'The following are multiple choice questions (with answers)
3851
+ about high school statistics.
3852
+
3853
+
3854
+ '
3855
+ target_delimiter: ' '
3856
+ fewshot_delimiter: '
3857
+
3858
+
3859
+ '
3860
+ fewshot_config:
3861
+ sampler: first_n
3862
+ metric_list:
3863
+ - metric: acc
3864
+ aggregation: mean
3865
+ higher_is_better: true
3866
+ output_type: multiple_choice
3867
+ repeats: 1
3868
+ should_decontaminate: false
3869
+ metadata:
3870
+ version: 0.0
3871
+ mmlu_high_school_us_history:
3872
+ task: mmlu_high_school_us_history
3873
+ task_alias: high_school_us_history
3874
+ group: mmlu_humanities
3875
+ group_alias: humanities
3876
+ dataset_path: hails/mmlu_no_train
3877
+ dataset_name: high_school_us_history
3878
+ test_split: test
3879
+ fewshot_split: dev
3880
+ doc_to_text: '{{question.strip()}}
3881
+
3882
+ A. {{choices[0]}}
3883
+
3884
+ B. {{choices[1]}}
3885
+
3886
+ C. {{choices[2]}}
3887
+
3888
+ D. {{choices[3]}}
3889
+
3890
+ Answer:'
3891
+ doc_to_target: answer
3892
+ doc_to_choice:
3893
+ - A
3894
+ - B
3895
+ - C
3896
+ - D
3897
+ description: 'The following are multiple choice questions (with answers)
3898
+ about high school us history.
3899
+
3900
+
3901
+ '
3902
+ target_delimiter: ' '
3903
+ fewshot_delimiter: '
3904
+
3905
+
3906
+ '
3907
+ fewshot_config:
3908
+ sampler: first_n
3909
+ metric_list:
3910
+ - metric: acc
3911
+ aggregation: mean
3912
+ higher_is_better: true
3913
+ output_type: multiple_choice
3914
+ repeats: 1
3915
+ should_decontaminate: false
3916
+ metadata:
3917
+ version: 0.0
3918
+ mmlu_high_school_world_history:
3919
+ task: mmlu_high_school_world_history
3920
+ task_alias: high_school_world_history
3921
+ group: mmlu_humanities
3922
+ group_alias: humanities
3923
+ dataset_path: hails/mmlu_no_train
3924
+ dataset_name: high_school_world_history
3925
+ test_split: test
3926
+ fewshot_split: dev
3927
+ doc_to_text: '{{question.strip()}}
3928
+
3929
+ A. {{choices[0]}}
3930
+
3931
+ B. {{choices[1]}}
3932
+
3933
+ C. {{choices[2]}}
3934
+
3935
+ D. {{choices[3]}}
3936
+
3937
+ Answer:'
3938
+ doc_to_target: answer
3939
+ doc_to_choice:
3940
+ - A
3941
+ - B
3942
+ - C
3943
+ - D
3944
+ description: 'The following are multiple choice questions (with answers)
3945
+ about high school world history.
3946
+
3947
+
3948
+ '
3949
+ target_delimiter: ' '
3950
+ fewshot_delimiter: '
3951
+
3952
+
3953
+ '
3954
+ fewshot_config:
3955
+ sampler: first_n
3956
+ metric_list:
3957
+ - metric: acc
3958
+ aggregation: mean
3959
+ higher_is_better: true
3960
+ output_type: multiple_choice
3961
+ repeats: 1
3962
+ should_decontaminate: false
3963
+ metadata:
3964
+ version: 0.0
3965
+ mmlu_human_aging:
3966
+ task: mmlu_human_aging
3967
+ task_alias: human_aging
3968
+ group: mmlu_other
3969
+ group_alias: other
3970
+ dataset_path: hails/mmlu_no_train
3971
+ dataset_name: human_aging
3972
+ test_split: test
3973
+ fewshot_split: dev
3974
+ doc_to_text: '{{question.strip()}}
3975
+
3976
+ A. {{choices[0]}}
3977
+
3978
+ B. {{choices[1]}}
3979
+
3980
+ C. {{choices[2]}}
3981
+
3982
+ D. {{choices[3]}}
3983
+
3984
+ Answer:'
3985
+ doc_to_target: answer
3986
+ doc_to_choice:
3987
+ - A
3988
+ - B
3989
+ - C
3990
+ - D
3991
+ description: 'The following are multiple choice questions (with answers)
3992
+ about human aging.
3993
+
3994
+
3995
+ '
3996
+ target_delimiter: ' '
3997
+ fewshot_delimiter: '
3998
+
3999
+
4000
+ '
4001
+ fewshot_config:
4002
+ sampler: first_n
4003
+ metric_list:
4004
+ - metric: acc
4005
+ aggregation: mean
4006
+ higher_is_better: true
4007
+ output_type: multiple_choice
4008
+ repeats: 1
4009
+ should_decontaminate: false
4010
+ metadata:
4011
+ version: 0.0
4012
+ mmlu_human_sexuality:
4013
+ task: mmlu_human_sexuality
4014
+ task_alias: human_sexuality
4015
+ group: mmlu_social_sciences
4016
+ group_alias: social_sciences
4017
+ dataset_path: hails/mmlu_no_train
4018
+ dataset_name: human_sexuality
4019
+ test_split: test
4020
+ fewshot_split: dev
4021
+ doc_to_text: '{{question.strip()}}
4022
+
4023
+ A. {{choices[0]}}
4024
+
4025
+ B. {{choices[1]}}
4026
+
4027
+ C. {{choices[2]}}
4028
+
4029
+ D. {{choices[3]}}
4030
+
4031
+ Answer:'
4032
+ doc_to_target: answer
4033
+ doc_to_choice:
4034
+ - A
4035
+ - B
4036
+ - C
4037
+ - D
4038
+ description: 'The following are multiple choice questions (with answers)
4039
+ about human sexuality.
4040
+
4041
+
4042
+ '
4043
+ target_delimiter: ' '
4044
+ fewshot_delimiter: '
4045
+
4046
+
4047
+ '
4048
+ fewshot_config:
4049
+ sampler: first_n
4050
+ metric_list:
4051
+ - metric: acc
4052
+ aggregation: mean
4053
+ higher_is_better: true
4054
+ output_type: multiple_choice
4055
+ repeats: 1
4056
+ should_decontaminate: false
4057
+ metadata:
4058
+ version: 0.0
4059
+ mmlu_international_law:
4060
+ task: mmlu_international_law
4061
+ task_alias: international_law
4062
+ group: mmlu_humanities
4063
+ group_alias: humanities
4064
+ dataset_path: hails/mmlu_no_train
4065
+ dataset_name: international_law
4066
+ test_split: test
4067
+ fewshot_split: dev
4068
+ doc_to_text: '{{question.strip()}}
4069
+
4070
+ A. {{choices[0]}}
4071
+
4072
+ B. {{choices[1]}}
4073
+
4074
+ C. {{choices[2]}}
4075
+
4076
+ D. {{choices[3]}}
4077
+
4078
+ Answer:'
4079
+ doc_to_target: answer
4080
+ doc_to_choice:
4081
+ - A
4082
+ - B
4083
+ - C
4084
+ - D
4085
+ description: 'The following are multiple choice questions (with answers)
4086
+ about international law.
4087
+
4088
+
4089
+ '
4090
+ target_delimiter: ' '
4091
+ fewshot_delimiter: '
4092
+
4093
+
4094
+ '
4095
+ fewshot_config:
4096
+ sampler: first_n
4097
+ metric_list:
4098
+ - metric: acc
4099
+ aggregation: mean
4100
+ higher_is_better: true
4101
+ output_type: multiple_choice
4102
+ repeats: 1
4103
+ should_decontaminate: false
4104
+ metadata:
4105
+ version: 0.0
4106
+ mmlu_jurisprudence:
4107
+ task: mmlu_jurisprudence
4108
+ task_alias: jurisprudence
4109
+ group: mmlu_humanities
4110
+ group_alias: humanities
4111
+ dataset_path: hails/mmlu_no_train
4112
+ dataset_name: jurisprudence
4113
+ test_split: test
4114
+ fewshot_split: dev
4115
+ doc_to_text: '{{question.strip()}}
4116
+
4117
+ A. {{choices[0]}}
4118
+
4119
+ B. {{choices[1]}}
4120
+
4121
+ C. {{choices[2]}}
4122
+
4123
+ D. {{choices[3]}}
4124
+
4125
+ Answer:'
4126
+ doc_to_target: answer
4127
+ doc_to_choice:
4128
+ - A
4129
+ - B
4130
+ - C
4131
+ - D
4132
+ description: 'The following are multiple choice questions (with answers)
4133
+ about jurisprudence.
4134
+
4135
+
4136
+ '
4137
+ target_delimiter: ' '
4138
+ fewshot_delimiter: '
4139
+
4140
+
4141
+ '
4142
+ fewshot_config:
4143
+ sampler: first_n
4144
+ metric_list:
4145
+ - metric: acc
4146
+ aggregation: mean
4147
+ higher_is_better: true
4148
+ output_type: multiple_choice
4149
+ repeats: 1
4150
+ should_decontaminate: false
4151
+ metadata:
4152
+ version: 0.0
4153
+ mmlu_logical_fallacies:
4154
+ task: mmlu_logical_fallacies
4155
+ task_alias: logical_fallacies
4156
+ group: mmlu_humanities
4157
+ group_alias: humanities
4158
+ dataset_path: hails/mmlu_no_train
4159
+ dataset_name: logical_fallacies
4160
+ test_split: test
4161
+ fewshot_split: dev
4162
+ doc_to_text: '{{question.strip()}}
4163
+
4164
+ A. {{choices[0]}}
4165
+
4166
+ B. {{choices[1]}}
4167
+
4168
+ C. {{choices[2]}}
4169
+
4170
+ D. {{choices[3]}}
4171
+
4172
+ Answer:'
4173
+ doc_to_target: answer
4174
+ doc_to_choice:
4175
+ - A
4176
+ - B
4177
+ - C
4178
+ - D
4179
+ description: 'The following are multiple choice questions (with answers)
4180
+ about logical fallacies.
4181
+
4182
+
4183
+ '
4184
+ target_delimiter: ' '
4185
+ fewshot_delimiter: '
4186
+
4187
+
4188
+ '
4189
+ fewshot_config:
4190
+ sampler: first_n
4191
+ metric_list:
4192
+ - metric: acc
4193
+ aggregation: mean
4194
+ higher_is_better: true
4195
+ output_type: multiple_choice
4196
+ repeats: 1
4197
+ should_decontaminate: false
4198
+ metadata:
4199
+ version: 0.0
4200
+ mmlu_machine_learning:
4201
+ task: mmlu_machine_learning
4202
+ task_alias: machine_learning
4203
+ group: mmlu_stem
4204
+ group_alias: stem
4205
+ dataset_path: hails/mmlu_no_train
4206
+ dataset_name: machine_learning
4207
+ test_split: test
4208
+ fewshot_split: dev
4209
+ doc_to_text: '{{question.strip()}}
4210
+
4211
+ A. {{choices[0]}}
4212
+
4213
+ B. {{choices[1]}}
4214
+
4215
+ C. {{choices[2]}}
4216
+
4217
+ D. {{choices[3]}}
4218
+
4219
+ Answer:'
4220
+ doc_to_target: answer
4221
+ doc_to_choice:
4222
+ - A
4223
+ - B
4224
+ - C
4225
+ - D
4226
+ description: 'The following are multiple choice questions (with answers)
4227
+ about machine learning.
4228
+
4229
+
4230
+ '
4231
+ target_delimiter: ' '
4232
+ fewshot_delimiter: '
4233
+
4234
+
4235
+ '
4236
+ fewshot_config:
4237
+ sampler: first_n
4238
+ metric_list:
4239
+ - metric: acc
4240
+ aggregation: mean
4241
+ higher_is_better: true
4242
+ output_type: multiple_choice
4243
+ repeats: 1
4244
+ should_decontaminate: false
4245
+ metadata:
4246
+ version: 0.0
4247
+ mmlu_management:
4248
+ task: mmlu_management
4249
+ task_alias: management
4250
+ group: mmlu_other
4251
+ group_alias: other
4252
+ dataset_path: hails/mmlu_no_train
4253
+ dataset_name: management
4254
+ test_split: test
4255
+ fewshot_split: dev
4256
+ doc_to_text: '{{question.strip()}}
4257
+
4258
+ A. {{choices[0]}}
4259
+
4260
+ B. {{choices[1]}}
4261
+
4262
+ C. {{choices[2]}}
4263
+
4264
+ D. {{choices[3]}}
4265
+
4266
+ Answer:'
4267
+ doc_to_target: answer
4268
+ doc_to_choice:
4269
+ - A
4270
+ - B
4271
+ - C
4272
+ - D
4273
+ description: 'The following are multiple choice questions (with answers)
4274
+ about management.
4275
+
4276
+
4277
+ '
4278
+ target_delimiter: ' '
4279
+ fewshot_delimiter: '
4280
+
4281
+
4282
+ '
4283
+ fewshot_config:
4284
+ sampler: first_n
4285
+ metric_list:
4286
+ - metric: acc
4287
+ aggregation: mean
4288
+ higher_is_better: true
4289
+ output_type: multiple_choice
4290
+ repeats: 1
4291
+ should_decontaminate: false
4292
+ metadata:
4293
+ version: 0.0
4294
+ mmlu_marketing:
4295
+ task: mmlu_marketing
4296
+ task_alias: marketing
4297
+ group: mmlu_other
4298
+ group_alias: other
4299
+ dataset_path: hails/mmlu_no_train
4300
+ dataset_name: marketing
4301
+ test_split: test
4302
+ fewshot_split: dev
4303
+ doc_to_text: '{{question.strip()}}
4304
+
4305
+ A. {{choices[0]}}
4306
+
4307
+ B. {{choices[1]}}
4308
+
4309
+ C. {{choices[2]}}
4310
+
4311
+ D. {{choices[3]}}
4312
+
4313
+ Answer:'
4314
+ doc_to_target: answer
4315
+ doc_to_choice:
4316
+ - A
4317
+ - B
4318
+ - C
4319
+ - D
4320
+ description: 'The following are multiple choice questions (with answers)
4321
+ about marketing.
4322
+
4323
+
4324
+ '
4325
+ target_delimiter: ' '
4326
+ fewshot_delimiter: '
4327
+
4328
+
4329
+ '
4330
+ fewshot_config:
4331
+ sampler: first_n
4332
+ metric_list:
4333
+ - metric: acc
4334
+ aggregation: mean
4335
+ higher_is_better: true
4336
+ output_type: multiple_choice
4337
+ repeats: 1
4338
+ should_decontaminate: false
4339
+ metadata:
4340
+ version: 0.0
4341
+ mmlu_medical_genetics:
4342
+ task: mmlu_medical_genetics
4343
+ task_alias: medical_genetics
4344
+ group: mmlu_other
4345
+ group_alias: other
4346
+ dataset_path: hails/mmlu_no_train
4347
+ dataset_name: medical_genetics
4348
+ test_split: test
4349
+ fewshot_split: dev
4350
+ doc_to_text: '{{question.strip()}}
4351
+
4352
+ A. {{choices[0]}}
4353
+
4354
+ B. {{choices[1]}}
4355
+
4356
+ C. {{choices[2]}}
4357
+
4358
+ D. {{choices[3]}}
4359
+
4360
+ Answer:'
4361
+ doc_to_target: answer
4362
+ doc_to_choice:
4363
+ - A
4364
+ - B
4365
+ - C
4366
+ - D
4367
+ description: 'The following are multiple choice questions (with answers)
4368
+ about medical genetics.
4369
+
4370
+
4371
+ '
4372
+ target_delimiter: ' '
4373
+ fewshot_delimiter: '
4374
+
4375
+
4376
+ '
4377
+ fewshot_config:
4378
+ sampler: first_n
4379
+ metric_list:
4380
+ - metric: acc
4381
+ aggregation: mean
4382
+ higher_is_better: true
4383
+ output_type: multiple_choice
4384
+ repeats: 1
4385
+ should_decontaminate: false
4386
+ metadata:
4387
+ version: 0.0
4388
+ mmlu_miscellaneous:
4389
+ task: mmlu_miscellaneous
4390
+ task_alias: miscellaneous
4391
+ group: mmlu_other
4392
+ group_alias: other
4393
+ dataset_path: hails/mmlu_no_train
4394
+ dataset_name: miscellaneous
4395
+ test_split: test
4396
+ fewshot_split: dev
4397
+ doc_to_text: '{{question.strip()}}
4398
+
4399
+ A. {{choices[0]}}
4400
+
4401
+ B. {{choices[1]}}
4402
+
4403
+ C. {{choices[2]}}
4404
+
4405
+ D. {{choices[3]}}
4406
+
4407
+ Answer:'
4408
+ doc_to_target: answer
4409
+ doc_to_choice:
4410
+ - A
4411
+ - B
4412
+ - C
4413
+ - D
4414
+ description: 'The following are multiple choice questions (with answers)
4415
+ about miscellaneous.
4416
+
4417
+
4418
+ '
4419
+ target_delimiter: ' '
4420
+ fewshot_delimiter: '
4421
+
4422
+
4423
+ '
4424
+ fewshot_config:
4425
+ sampler: first_n
4426
+ metric_list:
4427
+ - metric: acc
4428
+ aggregation: mean
4429
+ higher_is_better: true
4430
+ output_type: multiple_choice
4431
+ repeats: 1
4432
+ should_decontaminate: false
4433
+ metadata:
4434
+ version: 0.0
4435
+ mmlu_moral_disputes:
4436
+ task: mmlu_moral_disputes
4437
+ task_alias: moral_disputes
4438
+ group: mmlu_humanities
4439
+ group_alias: humanities
4440
+ dataset_path: hails/mmlu_no_train
4441
+ dataset_name: moral_disputes
4442
+ test_split: test
4443
+ fewshot_split: dev
4444
+ doc_to_text: '{{question.strip()}}
4445
+
4446
+ A. {{choices[0]}}
4447
+
4448
+ B. {{choices[1]}}
4449
+
4450
+ C. {{choices[2]}}
4451
+
4452
+ D. {{choices[3]}}
4453
+
4454
+ Answer:'
4455
+ doc_to_target: answer
4456
+ doc_to_choice:
4457
+ - A
4458
+ - B
4459
+ - C
4460
+ - D
4461
+ description: 'The following are multiple choice questions (with answers)
4462
+ about moral disputes.
4463
+
4464
+
4465
+ '
4466
+ target_delimiter: ' '
4467
+ fewshot_delimiter: '
4468
+
4469
+
4470
+ '
4471
+ fewshot_config:
4472
+ sampler: first_n
4473
+ metric_list:
4474
+ - metric: acc
4475
+ aggregation: mean
4476
+ higher_is_better: true
4477
+ output_type: multiple_choice
4478
+ repeats: 1
4479
+ should_decontaminate: false
4480
+ metadata:
4481
+ version: 0.0
4482
+ mmlu_moral_scenarios:
4483
+ task: mmlu_moral_scenarios
4484
+ task_alias: moral_scenarios
4485
+ group: mmlu_humanities
4486
+ group_alias: humanities
4487
+ dataset_path: hails/mmlu_no_train
4488
+ dataset_name: moral_scenarios
4489
+ test_split: test
4490
+ fewshot_split: dev
4491
+ doc_to_text: '{{question.strip()}}
4492
+
4493
+ A. {{choices[0]}}
4494
+
4495
+ B. {{choices[1]}}
4496
+
4497
+ C. {{choices[2]}}
4498
+
4499
+ D. {{choices[3]}}
4500
+
4501
+ Answer:'
4502
+ doc_to_target: answer
4503
+ doc_to_choice:
4504
+ - A
4505
+ - B
4506
+ - C
4507
+ - D
4508
+ description: 'The following are multiple choice questions (with answers)
4509
+ about moral scenarios.
4510
+
4511
+
4512
+ '
4513
+ target_delimiter: ' '
4514
+ fewshot_delimiter: '
4515
+
4516
+
4517
+ '
4518
+ fewshot_config:
4519
+ sampler: first_n
4520
+ metric_list:
4521
+ - metric: acc
4522
+ aggregation: mean
4523
+ higher_is_better: true
4524
+ output_type: multiple_choice
4525
+ repeats: 1
4526
+ should_decontaminate: false
4527
+ metadata:
4528
+ version: 0.0
4529
+ mmlu_nutrition:
4530
+ task: mmlu_nutrition
4531
+ task_alias: nutrition
4532
+ group: mmlu_other
4533
+ group_alias: other
4534
+ dataset_path: hails/mmlu_no_train
4535
+ dataset_name: nutrition
4536
+ test_split: test
4537
+ fewshot_split: dev
4538
+ doc_to_text: '{{question.strip()}}
4539
+
4540
+ A. {{choices[0]}}
4541
+
4542
+ B. {{choices[1]}}
4543
+
4544
+ C. {{choices[2]}}
4545
+
4546
+ D. {{choices[3]}}
4547
+
4548
+ Answer:'
4549
+ doc_to_target: answer
4550
+ doc_to_choice:
4551
+ - A
4552
+ - B
4553
+ - C
4554
+ - D
4555
+ description: 'The following are multiple choice questions (with answers)
4556
+ about nutrition.
4557
+
4558
+
4559
+ '
4560
+ target_delimiter: ' '
4561
+ fewshot_delimiter: '
4562
+
4563
+
4564
+ '
4565
+ fewshot_config:
4566
+ sampler: first_n
4567
+ metric_list:
4568
+ - metric: acc
4569
+ aggregation: mean
4570
+ higher_is_better: true
4571
+ output_type: multiple_choice
4572
+ repeats: 1
4573
+ should_decontaminate: false
4574
+ metadata:
4575
+ version: 0.0
4576
+ mmlu_philosophy:
4577
+ task: mmlu_philosophy
4578
+ task_alias: philosophy
4579
+ group: mmlu_humanities
4580
+ group_alias: humanities
4581
+ dataset_path: hails/mmlu_no_train
4582
+ dataset_name: philosophy
4583
+ test_split: test
4584
+ fewshot_split: dev
4585
+ doc_to_text: '{{question.strip()}}
4586
+
4587
+ A. {{choices[0]}}
4588
+
4589
+ B. {{choices[1]}}
4590
+
4591
+ C. {{choices[2]}}
4592
+
4593
+ D. {{choices[3]}}
4594
+
4595
+ Answer:'
4596
+ doc_to_target: answer
4597
+ doc_to_choice:
4598
+ - A
4599
+ - B
4600
+ - C
4601
+ - D
4602
+ description: 'The following are multiple choice questions (with answers)
4603
+ about philosophy.
4604
+
4605
+
4606
+ '
4607
+ target_delimiter: ' '
4608
+ fewshot_delimiter: '
4609
+
4610
+
4611
+ '
4612
+ fewshot_config:
4613
+ sampler: first_n
4614
+ metric_list:
4615
+ - metric: acc
4616
+ aggregation: mean
4617
+ higher_is_better: true
4618
+ output_type: multiple_choice
4619
+ repeats: 1
4620
+ should_decontaminate: false
4621
+ metadata:
4622
+ version: 0.0
4623
+ mmlu_prehistory:
4624
+ task: mmlu_prehistory
4625
+ task_alias: prehistory
4626
+ group: mmlu_humanities
4627
+ group_alias: humanities
4628
+ dataset_path: hails/mmlu_no_train
4629
+ dataset_name: prehistory
4630
+ test_split: test
4631
+ fewshot_split: dev
4632
+ doc_to_text: '{{question.strip()}}
4633
+
4634
+ A. {{choices[0]}}
4635
+
4636
+ B. {{choices[1]}}
4637
+
4638
+ C. {{choices[2]}}
4639
+
4640
+ D. {{choices[3]}}
4641
+
4642
+ Answer:'
4643
+ doc_to_target: answer
4644
+ doc_to_choice:
4645
+ - A
4646
+ - B
4647
+ - C
4648
+ - D
4649
+ description: 'The following are multiple choice questions (with answers)
4650
+ about prehistory.
4651
+
4652
+
4653
+ '
4654
+ target_delimiter: ' '
4655
+ fewshot_delimiter: '
4656
+
4657
+
4658
+ '
4659
+ fewshot_config:
4660
+ sampler: first_n
4661
+ metric_list:
4662
+ - metric: acc
4663
+ aggregation: mean
4664
+ higher_is_better: true
4665
+ output_type: multiple_choice
4666
+ repeats: 1
4667
+ should_decontaminate: false
4668
+ metadata:
4669
+ version: 0.0
4670
+ mmlu_professional_accounting:
4671
+ task: mmlu_professional_accounting
4672
+ task_alias: professional_accounting
4673
+ group: mmlu_other
4674
+ group_alias: other
4675
+ dataset_path: hails/mmlu_no_train
4676
+ dataset_name: professional_accounting
4677
+ test_split: test
4678
+ fewshot_split: dev
4679
+ doc_to_text: '{{question.strip()}}
4680
+
4681
+ A. {{choices[0]}}
4682
+
4683
+ B. {{choices[1]}}
4684
+
4685
+ C. {{choices[2]}}
4686
+
4687
+ D. {{choices[3]}}
4688
+
4689
+ Answer:'
4690
+ doc_to_target: answer
4691
+ doc_to_choice:
4692
+ - A
4693
+ - B
4694
+ - C
4695
+ - D
4696
+ description: 'The following are multiple choice questions (with answers)
4697
+ about professional accounting.
4698
+
4699
+
4700
+ '
4701
+ target_delimiter: ' '
4702
+ fewshot_delimiter: '
4703
+
4704
+
4705
+ '
4706
+ fewshot_config:
4707
+ sampler: first_n
4708
+ metric_list:
4709
+ - metric: acc
4710
+ aggregation: mean
4711
+ higher_is_better: true
4712
+ output_type: multiple_choice
4713
+ repeats: 1
4714
+ should_decontaminate: false
4715
+ metadata:
4716
+ version: 0.0
4717
+ mmlu_professional_law:
4718
+ task: mmlu_professional_law
4719
+ task_alias: professional_law
4720
+ group: mmlu_humanities
4721
+ group_alias: humanities
4722
+ dataset_path: hails/mmlu_no_train
4723
+ dataset_name: professional_law
4724
+ test_split: test
4725
+ fewshot_split: dev
4726
+ doc_to_text: '{{question.strip()}}
4727
+
4728
+ A. {{choices[0]}}
4729
+
4730
+ B. {{choices[1]}}
4731
+
4732
+ C. {{choices[2]}}
4733
+
4734
+ D. {{choices[3]}}
4735
+
4736
+ Answer:'
4737
+ doc_to_target: answer
4738
+ doc_to_choice:
4739
+ - A
4740
+ - B
4741
+ - C
4742
+ - D
4743
+ description: 'The following are multiple choice questions (with answers)
4744
+ about professional law.
4745
+
4746
+
4747
+ '
4748
+ target_delimiter: ' '
4749
+ fewshot_delimiter: '
4750
+
4751
+
4752
+ '
4753
+ fewshot_config:
4754
+ sampler: first_n
4755
+ metric_list:
4756
+ - metric: acc
4757
+ aggregation: mean
4758
+ higher_is_better: true
4759
+ output_type: multiple_choice
4760
+ repeats: 1
4761
+ should_decontaminate: false
4762
+ metadata:
4763
+ version: 0.0
4764
+ mmlu_professional_medicine:
4765
+ task: mmlu_professional_medicine
4766
+ task_alias: professional_medicine
4767
+ group: mmlu_other
4768
+ group_alias: other
4769
+ dataset_path: hails/mmlu_no_train
4770
+ dataset_name: professional_medicine
4771
+ test_split: test
4772
+ fewshot_split: dev
4773
+ doc_to_text: '{{question.strip()}}
4774
+
4775
+ A. {{choices[0]}}
4776
+
4777
+ B. {{choices[1]}}
4778
+
4779
+ C. {{choices[2]}}
4780
+
4781
+ D. {{choices[3]}}
4782
+
4783
+ Answer:'
4784
+ doc_to_target: answer
4785
+ doc_to_choice:
4786
+ - A
4787
+ - B
4788
+ - C
4789
+ - D
4790
+ description: 'The following are multiple choice questions (with answers)
4791
+ about professional medicine.
4792
+
4793
+
4794
+ '
4795
+ target_delimiter: ' '
4796
+ fewshot_delimiter: '
4797
+
4798
+
4799
+ '
4800
+ fewshot_config:
4801
+ sampler: first_n
4802
+ metric_list:
4803
+ - metric: acc
4804
+ aggregation: mean
4805
+ higher_is_better: true
4806
+ output_type: multiple_choice
4807
+ repeats: 1
4808
+ should_decontaminate: false
4809
+ metadata:
4810
+ version: 0.0
4811
+ mmlu_professional_psychology:
4812
+ task: mmlu_professional_psychology
4813
+ task_alias: professional_psychology
4814
+ group: mmlu_social_sciences
4815
+ group_alias: social_sciences
4816
+ dataset_path: hails/mmlu_no_train
4817
+ dataset_name: professional_psychology
4818
+ test_split: test
4819
+ fewshot_split: dev
4820
+ doc_to_text: '{{question.strip()}}
4821
+
4822
+ A. {{choices[0]}}
4823
+
4824
+ B. {{choices[1]}}
4825
+
4826
+ C. {{choices[2]}}
4827
+
4828
+ D. {{choices[3]}}
4829
+
4830
+ Answer:'
4831
+ doc_to_target: answer
4832
+ doc_to_choice:
4833
+ - A
4834
+ - B
4835
+ - C
4836
+ - D
4837
+ description: 'The following are multiple choice questions (with answers)
4838
+ about professional psychology.
4839
+
4840
+
4841
+ '
4842
+ target_delimiter: ' '
4843
+ fewshot_delimiter: '
4844
+
4845
+
4846
+ '
4847
+ fewshot_config:
4848
+ sampler: first_n
4849
+ metric_list:
4850
+ - metric: acc
4851
+ aggregation: mean
4852
+ higher_is_better: true
4853
+ output_type: multiple_choice
4854
+ repeats: 1
4855
+ should_decontaminate: false
4856
+ metadata:
4857
+ version: 0.0
4858
+ mmlu_public_relations:
4859
+ task: mmlu_public_relations
4860
+ task_alias: public_relations
4861
+ group: mmlu_social_sciences
4862
+ group_alias: social_sciences
4863
+ dataset_path: hails/mmlu_no_train
4864
+ dataset_name: public_relations
4865
+ test_split: test
4866
+ fewshot_split: dev
4867
+ doc_to_text: '{{question.strip()}}
4868
+
4869
+ A. {{choices[0]}}
4870
+
4871
+ B. {{choices[1]}}
4872
+
4873
+ C. {{choices[2]}}
4874
+
4875
+ D. {{choices[3]}}
4876
+
4877
+ Answer:'
4878
+ doc_to_target: answer
4879
+ doc_to_choice:
4880
+ - A
4881
+ - B
4882
+ - C
4883
+ - D
4884
+ description: 'The following are multiple choice questions (with answers)
4885
+ about public relations.
4886
+
4887
+
4888
+ '
4889
+ target_delimiter: ' '
4890
+ fewshot_delimiter: '
4891
+
4892
+
4893
+ '
4894
+ fewshot_config:
4895
+ sampler: first_n
4896
+ metric_list:
4897
+ - metric: acc
4898
+ aggregation: mean
4899
+ higher_is_better: true
4900
+ output_type: multiple_choice
4901
+ repeats: 1
4902
+ should_decontaminate: false
4903
+ metadata:
4904
+ version: 0.0
4905
+ mmlu_security_studies:
4906
+ task: mmlu_security_studies
4907
+ task_alias: security_studies
4908
+ group: mmlu_social_sciences
4909
+ group_alias: social_sciences
4910
+ dataset_path: hails/mmlu_no_train
4911
+ dataset_name: security_studies
4912
+ test_split: test
4913
+ fewshot_split: dev
4914
+ doc_to_text: '{{question.strip()}}
4915
+
4916
+ A. {{choices[0]}}
4917
+
4918
+ B. {{choices[1]}}
4919
+
4920
+ C. {{choices[2]}}
4921
+
4922
+ D. {{choices[3]}}
4923
+
4924
+ Answer:'
4925
+ doc_to_target: answer
4926
+ doc_to_choice:
4927
+ - A
4928
+ - B
4929
+ - C
4930
+ - D
4931
+ description: 'The following are multiple choice questions (with answers)
4932
+ about security studies.
4933
+
4934
+
4935
+ '
4936
+ target_delimiter: ' '
4937
+ fewshot_delimiter: '
4938
+
4939
+
4940
+ '
4941
+ fewshot_config:
4942
+ sampler: first_n
4943
+ metric_list:
4944
+ - metric: acc
4945
+ aggregation: mean
4946
+ higher_is_better: true
4947
+ output_type: multiple_choice
4948
+ repeats: 1
4949
+ should_decontaminate: false
4950
+ metadata:
4951
+ version: 0.0
4952
+ mmlu_sociology:
4953
+ task: mmlu_sociology
4954
+ task_alias: sociology
4955
+ group: mmlu_social_sciences
4956
+ group_alias: social_sciences
4957
+ dataset_path: hails/mmlu_no_train
4958
+ dataset_name: sociology
4959
+ test_split: test
4960
+ fewshot_split: dev
4961
+ doc_to_text: '{{question.strip()}}
4962
+
4963
+ A. {{choices[0]}}
4964
+
4965
+ B. {{choices[1]}}
4966
+
4967
+ C. {{choices[2]}}
4968
+
4969
+ D. {{choices[3]}}
4970
+
4971
+ Answer:'
4972
+ doc_to_target: answer
4973
+ doc_to_choice:
4974
+ - A
4975
+ - B
4976
+ - C
4977
+ - D
4978
+ description: 'The following are multiple choice questions (with answers)
4979
+ about sociology.
4980
+
4981
+
4982
+ '
4983
+ target_delimiter: ' '
4984
+ fewshot_delimiter: '
4985
+
4986
+
4987
+ '
4988
+ fewshot_config:
4989
+ sampler: first_n
4990
+ metric_list:
4991
+ - metric: acc
4992
+ aggregation: mean
4993
+ higher_is_better: true
4994
+ output_type: multiple_choice
4995
+ repeats: 1
4996
+ should_decontaminate: false
4997
+ metadata:
4998
+ version: 0.0
4999
+ mmlu_us_foreign_policy:
5000
+ task: mmlu_us_foreign_policy
5001
+ task_alias: us_foreign_policy
5002
+ group: mmlu_social_sciences
5003
+ group_alias: social_sciences
5004
+ dataset_path: hails/mmlu_no_train
5005
+ dataset_name: us_foreign_policy
5006
+ test_split: test
5007
+ fewshot_split: dev
5008
+ doc_to_text: '{{question.strip()}}
5009
+
5010
+ A. {{choices[0]}}
5011
+
5012
+ B. {{choices[1]}}
5013
+
5014
+ C. {{choices[2]}}
5015
+
5016
+ D. {{choices[3]}}
5017
+
5018
+ Answer:'
5019
+ doc_to_target: answer
5020
+ doc_to_choice:
5021
+ - A
5022
+ - B
5023
+ - C
5024
+ - D
5025
+ description: 'The following are multiple choice questions (with answers)
5026
+ about us foreign policy.
5027
+
5028
+
5029
+ '
5030
+ target_delimiter: ' '
5031
+ fewshot_delimiter: '
5032
+
5033
+
5034
+ '
5035
+ fewshot_config:
5036
+ sampler: first_n
5037
+ metric_list:
5038
+ - metric: acc
5039
+ aggregation: mean
5040
+ higher_is_better: true
5041
+ output_type: multiple_choice
5042
+ repeats: 1
5043
+ should_decontaminate: false
5044
+ metadata:
5045
+ version: 0.0
5046
+ mmlu_virology:
5047
+ task: mmlu_virology
5048
+ task_alias: virology
5049
+ group: mmlu_other
5050
+ group_alias: other
5051
+ dataset_path: hails/mmlu_no_train
5052
+ dataset_name: virology
5053
+ test_split: test
5054
+ fewshot_split: dev
5055
+ doc_to_text: '{{question.strip()}}
5056
+
5057
+ A. {{choices[0]}}
5058
+
5059
+ B. {{choices[1]}}
5060
+
5061
+ C. {{choices[2]}}
5062
+
5063
+ D. {{choices[3]}}
5064
+
5065
+ Answer:'
5066
+ doc_to_target: answer
5067
+ doc_to_choice:
5068
+ - A
5069
+ - B
5070
+ - C
5071
+ - D
5072
+ description: 'The following are multiple choice questions (with answers)
5073
+ about virology.
5074
+
5075
+
5076
+ '
5077
+ target_delimiter: ' '
5078
+ fewshot_delimiter: '
5079
+
5080
+
5081
+ '
5082
+ fewshot_config:
5083
+ sampler: first_n
5084
+ metric_list:
5085
+ - metric: acc
5086
+ aggregation: mean
5087
+ higher_is_better: true
5088
+ output_type: multiple_choice
5089
+ repeats: 1
5090
+ should_decontaminate: false
5091
+ metadata:
5092
+ version: 0.0
5093
+ mmlu_world_religions:
5094
+ task: mmlu_world_religions
5095
+ task_alias: world_religions
5096
+ group: mmlu_humanities
5097
+ group_alias: humanities
5098
+ dataset_path: hails/mmlu_no_train
5099
+ dataset_name: world_religions
5100
+ test_split: test
5101
+ fewshot_split: dev
5102
+ doc_to_text: '{{question.strip()}}
5103
+
5104
+ A. {{choices[0]}}
5105
+
5106
+ B. {{choices[1]}}
5107
+
5108
+ C. {{choices[2]}}
5109
+
5110
+ D. {{choices[3]}}
5111
+
5112
+ Answer:'
5113
+ doc_to_target: answer
5114
+ doc_to_choice:
5115
+ - A
5116
+ - B
5117
+ - C
5118
+ - D
5119
+ description: 'The following are multiple choice questions (with answers)
5120
+ about world religions.
5121
+
5122
+
5123
+ '
5124
+ target_delimiter: ' '
5125
+ fewshot_delimiter: '
5126
+
5127
+
5128
+ '
5129
+ fewshot_config:
5130
+ sampler: first_n
5131
+ metric_list:
5132
+ - metric: acc
5133
+ aggregation: mean
5134
+ higher_is_better: true
5135
+ output_type: multiple_choice
5136
+ repeats: 1
5137
+ should_decontaminate: false
5138
+ metadata:
5139
+ version: 0.0
5140
+ versions:
5141
+ mmlu_abstract_algebra: 0.0
5142
+ mmlu_anatomy: 0.0
5143
+ mmlu_astronomy: 0.0
5144
+ mmlu_business_ethics: 0.0
5145
+ mmlu_clinical_knowledge: 0.0
5146
+ mmlu_college_biology: 0.0
5147
+ mmlu_college_chemistry: 0.0
5148
+ mmlu_college_computer_science: 0.0
5149
+ mmlu_college_mathematics: 0.0
5150
+ mmlu_college_medicine: 0.0
5151
+ mmlu_college_physics: 0.0
5152
+ mmlu_computer_security: 0.0
5153
+ mmlu_conceptual_physics: 0.0
5154
+ mmlu_econometrics: 0.0
5155
+ mmlu_electrical_engineering: 0.0
5156
+ mmlu_elementary_mathematics: 0.0
5157
+ mmlu_formal_logic: 0.0
5158
+ mmlu_global_facts: 0.0
5159
+ mmlu_high_school_biology: 0.0
5160
+ mmlu_high_school_chemistry: 0.0
5161
+ mmlu_high_school_computer_science: 0.0
5162
+ mmlu_high_school_european_history: 0.0
5163
+ mmlu_high_school_geography: 0.0
5164
+ mmlu_high_school_government_and_politics: 0.0
5165
+ mmlu_high_school_macroeconomics: 0.0
5166
+ mmlu_high_school_mathematics: 0.0
5167
+ mmlu_high_school_microeconomics: 0.0
5168
+ mmlu_high_school_physics: 0.0
5169
+ mmlu_high_school_psychology: 0.0
5170
+ mmlu_high_school_statistics: 0.0
5171
+ mmlu_high_school_us_history: 0.0
5172
+ mmlu_high_school_world_history: 0.0
5173
+ mmlu_human_aging: 0.0
5174
+ mmlu_human_sexuality: 0.0
5175
+ mmlu_international_law: 0.0
5176
+ mmlu_jurisprudence: 0.0
5177
+ mmlu_logical_fallacies: 0.0
5178
+ mmlu_machine_learning: 0.0
5179
+ mmlu_management: 0.0
5180
+ mmlu_marketing: 0.0
5181
+ mmlu_medical_genetics: 0.0
5182
+ mmlu_miscellaneous: 0.0
5183
+ mmlu_moral_disputes: 0.0
5184
+ mmlu_moral_scenarios: 0.0
5185
+ mmlu_nutrition: 0.0
5186
+ mmlu_philosophy: 0.0
5187
+ mmlu_prehistory: 0.0
5188
+ mmlu_professional_accounting: 0.0
5189
+ mmlu_professional_law: 0.0
5190
+ mmlu_professional_medicine: 0.0
5191
+ mmlu_professional_psychology: 0.0
5192
+ mmlu_public_relations: 0.0
5193
+ mmlu_security_studies: 0.0
5194
+ mmlu_sociology: 0.0
5195
+ mmlu_us_foreign_policy: 0.0
5196
+ mmlu_virology: 0.0
5197
+ mmlu_world_religions: 0.0
5198
+ n-shot:
5199
+ mmlu: 0
5200
+ config:
5201
+ model: vllm
5202
+ model_args: pretrained=DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.8,max_model_len=2048,trust_remote_code=True
5203
+ batch_size: auto
5204
+ batch_sizes: []
5205
+ bootstrap_iters: 100000
5206
+ git_hash: cddf85d
5207
+ pretty_env_info: 'PyTorch version: 2.1.2+cu121
5208
+
5209
+ Is debug build: False
5210
+
5211
+ CUDA used to build PyTorch: 12.1
5212
+
5213
+ ROCM used to build PyTorch: N/A
5214
+
5215
+
5216
+ OS: Ubuntu 22.04.3 LTS (x86_64)
5217
+
5218
+ GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
5219
+
5220
+ Clang version: Could not collect
5221
+
5222
+ CMake version: version 3.25.0
5223
+
5224
+ Libc version: glibc-2.35
5225
+
5226
+
5227
+ Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit
5228
+ runtime)
5229
+
5230
+ Python platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
5231
+
5232
+ Is CUDA available: True
5233
+
5234
+ CUDA runtime version: 11.8.89
5235
+
5236
+ CUDA_MODULE_LOADING set to: LAZY
5237
+
5238
+ GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
5239
+
5240
+ Nvidia driver version: 550.54.15
5241
+
5242
+ cuDNN version: Could not collect
5243
+
5244
+ HIP runtime version: N/A
5245
+
5246
+ MIOpen runtime version: N/A
5247
+
5248
+ Is XNNPACK available: True
5249
+
5250
+
5251
+ CPU:
5252
+
5253
+ Architecture: x86_64
5254
+
5255
+ CPU op-mode(s): 32-bit, 64-bit
5256
+
5257
+ Address sizes: 52 bits physical, 57 bits virtual
5258
+
5259
+ Byte Order: Little Endian
5260
+
5261
+ CPU(s): 64
5262
+
5263
+ On-line CPU(s) list: 0-63
5264
+
5265
+ Vendor ID: AuthenticAMD
5266
+
5267
+ Model name: AMD EPYC 9354 32-Core Processor
5268
+
5269
+ CPU family: 25
5270
+
5271
+ Model: 17
5272
+
5273
+ Thread(s) per core: 2
5274
+
5275
+ Core(s) per socket: 32
5276
+
5277
+ Socket(s): 1
5278
+
5279
+ Stepping: 1
5280
+
5281
+ Frequency boost: enabled
5282
+
5283
+ CPU max MHz: 3799.0720
5284
+
5285
+ CPU min MHz: 1500.0000
5286
+
5287
+ BogoMIPS: 6499.74
5288
+
5289
+ Flags: fpu vme de pse tsc msr pae mce cx8 apic
5290
+ sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
5291
+ mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl
5292
+ nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3
5293
+ fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand
5294
+ lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
5295
+ osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc
5296
+ mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perfmon_v2 ibrs
5297
+ ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid
5298
+ cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd
5299
+ sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
5300
+ cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd
5301
+ amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
5302
+ decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl
5303
+ vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni
5304
+ avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm
5305
+ flush_l1d
5306
+
5307
+ Virtualization: AMD-V
5308
+
5309
+ L1d cache: 1 MiB (32 instances)
5310
+
5311
+ L1i cache: 1 MiB (32 instances)
5312
+
5313
+ L2 cache: 32 MiB (32 instances)
5314
+
5315
+ L3 cache: 256 MiB (8 instances)
5316
+
5317
+ NUMA node(s): 1
5318
+
5319
+ NUMA node0 CPU(s): 0-63
5320
+
5321
+ Vulnerability Gather data sampling: Not affected
5322
+
5323
+ Vulnerability Itlb multihit: Not affected
5324
+
5325
+ Vulnerability L1tf: Not affected
5326
+
5327
+ Vulnerability Mds: Not affected
5328
+
5329
+ Vulnerability Meltdown: Not affected
5330
+
5331
+ Vulnerability Mmio stale data: Not affected
5332
+
5333
+ Vulnerability Retbleed: Not affected
5334
+
5335
+ Vulnerability Spec rstack overflow: Mitigation; Safe RET
5336
+
5337
+ Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass
5338
+ disabled via prctl
5339
+
5340
+ Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers
5341
+ and __user pointer sanitization
5342
+
5343
+ Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS;
5344
+ IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected;
5345
+ BHI Not affected
5346
+
5347
+ Vulnerability Srbds: Not affected
5348
+
5349
+ Vulnerability Tsx async abort: Not affected
5350
+
5351
+
5352
  Versions of relevant libraries:
5353
 
5354
  [pip3] numpy==1.24.1