Xiaowen-dg commited on
Commit
40d0894
1 Parent(s): 7741b8f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3252 -0
README.md CHANGED
@@ -2148,6 +2148,3258 @@ model-index:
2148
  Vulnerability Tsx async abort: Not affected
2149
 
2150
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2151
  Versions of relevant libraries:
2152
 
2153
  [pip3] numpy==1.24.1
 
2148
  Vulnerability Tsx async abort: Not affected
2149
 
2150
 
2151
+ Versions of relevant libraries:
2152
+
2153
+ [pip3] numpy==1.24.1
2154
+
2155
+ [pip3] torch==2.1.2
2156
+
2157
+ [pip3] torchaudio==2.0.2+cu118
2158
+
2159
+ [pip3] torchvision==0.15.2+cu118
2160
+
2161
+ [pip3] triton==2.1.0
2162
+
2163
+ [conda] Could not collect'
2164
+ transformers_version: 4.42.4
2165
+ - task:
2166
+ type: mmlu
2167
+ dataset:
2168
+ name: mmlu
2169
+ type: public-dataset
2170
+ metrics:
2171
+ - type: acc
2172
+ value: '0.634'
2173
+ args:
2174
+ results:
2175
+ mmlu:
2176
+ acc,none: 0.6240564022219057
2177
+ acc_stderr,none: 0.0038572036515963077
2178
+ alias: mmlu
2179
+ mmlu_humanities:
2180
+ alias: ' - humanities'
2181
+ acc,none: 0.5704569606801275
2182
+ acc_stderr,none: 0.00680518705216219
2183
+ mmlu_formal_logic:
2184
+ alias: ' - formal_logic'
2185
+ acc,none: 0.42857142857142855
2186
+ acc_stderr,none: 0.0442626668137991
2187
+ mmlu_high_school_european_history:
2188
+ alias: ' - high_school_european_history'
2189
+ acc,none: 0.7393939393939394
2190
+ acc_stderr,none: 0.034277431758165236
2191
+ mmlu_high_school_us_history:
2192
+ alias: ' - high_school_us_history'
2193
+ acc,none: 0.8235294117647058
2194
+ acc_stderr,none: 0.02675640153807895
2195
+ mmlu_high_school_world_history:
2196
+ alias: ' - high_school_world_history'
2197
+ acc,none: 0.8354430379746836
2198
+ acc_stderr,none: 0.024135736240566946
2199
+ mmlu_international_law:
2200
+ alias: ' - international_law'
2201
+ acc,none: 0.71900826446281
2202
+ acc_stderr,none: 0.04103203830514512
2203
+ mmlu_jurisprudence:
2204
+ alias: ' - jurisprudence'
2205
+ acc,none: 0.7592592592592593
2206
+ acc_stderr,none: 0.04133119440243839
2207
+ mmlu_logical_fallacies:
2208
+ alias: ' - logical_fallacies'
2209
+ acc,none: 0.7668711656441718
2210
+ acc_stderr,none: 0.0332201579577674
2211
+ mmlu_moral_disputes:
2212
+ alias: ' - moral_disputes'
2213
+ acc,none: 0.6502890173410405
2214
+ acc_stderr,none: 0.02567428145653102
2215
+ mmlu_moral_scenarios:
2216
+ alias: ' - moral_scenarios'
2217
+ acc,none: 0.35307262569832404
2218
+ acc_stderr,none: 0.01598420454526857
2219
+ mmlu_philosophy:
2220
+ alias: ' - philosophy'
2221
+ acc,none: 0.7009646302250804
2222
+ acc_stderr,none: 0.026003301117885142
2223
+ mmlu_prehistory:
2224
+ alias: ' - prehistory'
2225
+ acc,none: 0.7160493827160493
2226
+ acc_stderr,none: 0.02508947852376513
2227
+ mmlu_professional_law:
2228
+ alias: ' - professional_law'
2229
+ acc,none: 0.470013037809648
2230
+ acc_stderr,none: 0.012747248967079062
2231
+ mmlu_world_religions:
2232
+ alias: ' - world_religions'
2233
+ acc,none: 0.7953216374269005
2234
+ acc_stderr,none: 0.030944459778533204
2235
+ mmlu_other:
2236
+ alias: ' - other'
2237
+ acc,none: 0.7151593176697779
2238
+ acc_stderr,none: 0.00781329664246705
2239
+ mmlu_business_ethics:
2240
+ alias: ' - business_ethics'
2241
+ acc,none: 0.61
2242
+ acc_stderr,none: 0.04902071300001974
2243
+ mmlu_clinical_knowledge:
2244
+ alias: ' - clinical_knowledge'
2245
+ acc,none: 0.7584905660377359
2246
+ acc_stderr,none: 0.026341480371118355
2247
+ mmlu_college_medicine:
2248
+ alias: ' - college_medicine'
2249
+ acc,none: 0.6589595375722543
2250
+ acc_stderr,none: 0.036146654241808254
2251
+ mmlu_global_facts:
2252
+ alias: ' - global_facts'
2253
+ acc,none: 0.41
2254
+ acc_stderr,none: 0.04943110704237102
2255
+ mmlu_human_aging:
2256
+ alias: ' - human_aging'
2257
+ acc,none: 0.6860986547085202
2258
+ acc_stderr,none: 0.031146796482972465
2259
+ mmlu_management:
2260
+ alias: ' - management'
2261
+ acc,none: 0.8543689320388349
2262
+ acc_stderr,none: 0.03492606476623789
2263
+ mmlu_marketing:
2264
+ alias: ' - marketing'
2265
+ acc,none: 0.8717948717948718
2266
+ acc_stderr,none: 0.02190190511507333
2267
+ mmlu_medical_genetics:
2268
+ alias: ' - medical_genetics'
2269
+ acc,none: 0.75
2270
+ acc_stderr,none: 0.04351941398892446
2271
+ mmlu_miscellaneous:
2272
+ alias: ' - miscellaneous'
2273
+ acc,none: 0.8263090676883781
2274
+ acc_stderr,none: 0.013547415658662264
2275
+ mmlu_nutrition:
2276
+ alias: ' - nutrition'
2277
+ acc,none: 0.7091503267973857
2278
+ acc_stderr,none: 0.02600480036395213
2279
+ mmlu_professional_accounting:
2280
+ alias: ' - professional_accounting'
2281
+ acc,none: 0.5212765957446809
2282
+ acc_stderr,none: 0.029800481645628693
2283
+ mmlu_professional_medicine:
2284
+ alias: ' - professional_medicine'
2285
+ acc,none: 0.6875
2286
+ acc_stderr,none: 0.02815637344037142
2287
+ mmlu_virology:
2288
+ alias: ' - virology'
2289
+ acc,none: 0.5240963855421686
2290
+ acc_stderr,none: 0.038879718495972646
2291
+ mmlu_social_sciences:
2292
+ alias: ' - social_sciences'
2293
+ acc,none: 0.7221319467013325
2294
+ acc_stderr,none: 0.007909660127989188
2295
+ mmlu_econometrics:
2296
+ alias: ' - econometrics'
2297
+ acc,none: 0.5
2298
+ acc_stderr,none: 0.047036043419179864
2299
+ mmlu_high_school_geography:
2300
+ alias: ' - high_school_geography'
2301
+ acc,none: 0.7575757575757576
2302
+ acc_stderr,none: 0.030532892233932026
2303
+ mmlu_high_school_government_and_politics:
2304
+ alias: ' - high_school_government_and_politics'
2305
+ acc,none: 0.8652849740932642
2306
+ acc_stderr,none: 0.024639789097709437
2307
+ mmlu_high_school_macroeconomics:
2308
+ alias: ' - high_school_macroeconomics'
2309
+ acc,none: 0.5923076923076923
2310
+ acc_stderr,none: 0.024915243985987847
2311
+ mmlu_high_school_microeconomics:
2312
+ alias: ' - high_school_microeconomics'
2313
+ acc,none: 0.6932773109243697
2314
+ acc_stderr,none: 0.02995382389188703
2315
+ mmlu_high_school_psychology:
2316
+ alias: ' - high_school_psychology'
2317
+ acc,none: 0.7963302752293578
2318
+ acc_stderr,none: 0.017266742087630797
2319
+ mmlu_human_sexuality:
2320
+ alias: ' - human_sexuality'
2321
+ acc,none: 0.7862595419847328
2322
+ acc_stderr,none: 0.035954616117746904
2323
+ mmlu_professional_psychology:
2324
+ alias: ' - professional_psychology'
2325
+ acc,none: 0.6683006535947712
2326
+ acc_stderr,none: 0.01904748523936038
2327
+ mmlu_public_relations:
2328
+ alias: ' - public_relations'
2329
+ acc,none: 0.6545454545454545
2330
+ acc_stderr,none: 0.04554619617541054
2331
+ mmlu_security_studies:
2332
+ alias: ' - security_studies'
2333
+ acc,none: 0.726530612244898
2334
+ acc_stderr,none: 0.02853556033712844
2335
+ mmlu_sociology:
2336
+ alias: ' - sociology'
2337
+ acc,none: 0.845771144278607
2338
+ acc_stderr,none: 0.025538433368578337
2339
+ mmlu_us_foreign_policy:
2340
+ alias: ' - us_foreign_policy'
2341
+ acc,none: 0.86
2342
+ acc_stderr,none: 0.03487350880197769
2343
+ mmlu_stem:
2344
+ alias: ' - stem'
2345
+ acc,none: 0.5185537583254044
2346
+ acc_stderr,none: 0.008550177348592522
2347
+ mmlu_abstract_algebra:
2348
+ alias: ' - abstract_algebra'
2349
+ acc,none: 0.36
2350
+ acc_stderr,none: 0.04824181513244218
2351
+ mmlu_anatomy:
2352
+ alias: ' - anatomy'
2353
+ acc,none: 0.6074074074074074
2354
+ acc_stderr,none: 0.04218506215368879
2355
+ mmlu_astronomy:
2356
+ alias: ' - astronomy'
2357
+ acc,none: 0.6973684210526315
2358
+ acc_stderr,none: 0.03738520676119668
2359
+ mmlu_college_biology:
2360
+ alias: ' - college_biology'
2361
+ acc,none: 0.7916666666666666
2362
+ acc_stderr,none: 0.033961162058453336
2363
+ mmlu_college_chemistry:
2364
+ alias: ' - college_chemistry'
2365
+ acc,none: 0.4
2366
+ acc_stderr,none: 0.04923659639173309
2367
+ mmlu_college_computer_science:
2368
+ alias: ' - college_computer_science'
2369
+ acc,none: 0.42
2370
+ acc_stderr,none: 0.049604496374885836
2371
+ mmlu_college_mathematics:
2372
+ alias: ' - college_mathematics'
2373
+ acc,none: 0.33
2374
+ acc_stderr,none: 0.047258156262526045
2375
+ mmlu_college_physics:
2376
+ alias: ' - college_physics'
2377
+ acc,none: 0.35294117647058826
2378
+ acc_stderr,none: 0.047551296160629475
2379
+ mmlu_computer_security:
2380
+ alias: ' - computer_security'
2381
+ acc,none: 0.76
2382
+ acc_stderr,none: 0.042923469599092816
2383
+ mmlu_conceptual_physics:
2384
+ alias: ' - conceptual_physics'
2385
+ acc,none: 0.5531914893617021
2386
+ acc_stderr,none: 0.032500536843658404
2387
+ mmlu_electrical_engineering:
2388
+ alias: ' - electrical_engineering'
2389
+ acc,none: 0.5172413793103449
2390
+ acc_stderr,none: 0.04164188720169375
2391
+ mmlu_elementary_mathematics:
2392
+ alias: ' - elementary_mathematics'
2393
+ acc,none: 0.42328042328042326
2394
+ acc_stderr,none: 0.025446365634406772
2395
+ mmlu_high_school_biology:
2396
+ alias: ' - high_school_biology'
2397
+ acc,none: 0.7451612903225806
2398
+ acc_stderr,none: 0.0247901184593322
2399
+ mmlu_high_school_chemistry:
2400
+ alias: ' - high_school_chemistry'
2401
+ acc,none: 0.4827586206896552
2402
+ acc_stderr,none: 0.035158955511657
2403
+ mmlu_high_school_computer_science:
2404
+ alias: ' - high_school_computer_science'
2405
+ acc,none: 0.65
2406
+ acc_stderr,none: 0.0479372485441102
2407
+ mmlu_high_school_mathematics:
2408
+ alias: ' - high_school_mathematics'
2409
+ acc,none: 0.37407407407407406
2410
+ acc_stderr,none: 0.029502861128955286
2411
+ mmlu_high_school_physics:
2412
+ alias: ' - high_school_physics'
2413
+ acc,none: 0.3841059602649007
2414
+ acc_stderr,none: 0.03971301814719197
2415
+ mmlu_high_school_statistics:
2416
+ alias: ' - high_school_statistics'
2417
+ acc,none: 0.4722222222222222
2418
+ acc_stderr,none: 0.0340470532865388
2419
+ mmlu_machine_learning:
2420
+ alias: ' - machine_learning'
2421
+ acc,none: 0.44642857142857145
2422
+ acc_stderr,none: 0.04718471485219588
2423
+ groups:
2424
+ mmlu:
2425
+ acc,none: 0.6240564022219057
2426
+ acc_stderr,none: 0.0038572036515963077
2427
+ alias: mmlu
2428
+ mmlu_humanities:
2429
+ alias: ' - humanities'
2430
+ acc,none: 0.5704569606801275
2431
+ acc_stderr,none: 0.00680518705216219
2432
+ mmlu_other:
2433
+ alias: ' - other'
2434
+ acc,none: 0.7151593176697779
2435
+ acc_stderr,none: 0.00781329664246705
2436
+ mmlu_social_sciences:
2437
+ alias: ' - social_sciences'
2438
+ acc,none: 0.7221319467013325
2439
+ acc_stderr,none: 0.007909660127989188
2440
+ mmlu_stem:
2441
+ alias: ' - stem'
2442
+ acc,none: 0.5185537583254044
2443
+ acc_stderr,none: 0.008550177348592522
2444
+ group_subtasks:
2445
+ mmlu_stem:
2446
+ - mmlu_college_computer_science
2447
+ - mmlu_college_chemistry
2448
+ - mmlu_college_biology
2449
+ - mmlu_astronomy
2450
+ - mmlu_anatomy
2451
+ - mmlu_abstract_algebra
2452
+ - mmlu_machine_learning
2453
+ - mmlu_high_school_statistics
2454
+ - mmlu_high_school_physics
2455
+ - mmlu_high_school_mathematics
2456
+ - mmlu_high_school_computer_science
2457
+ - mmlu_high_school_chemistry
2458
+ - mmlu_high_school_biology
2459
+ - mmlu_elementary_mathematics
2460
+ - mmlu_electrical_engineering
2461
+ - mmlu_conceptual_physics
2462
+ - mmlu_computer_security
2463
+ - mmlu_college_physics
2464
+ - mmlu_college_mathematics
2465
+ mmlu_other:
2466
+ - mmlu_clinical_knowledge
2467
+ - mmlu_business_ethics
2468
+ - mmlu_virology
2469
+ - mmlu_professional_medicine
2470
+ - mmlu_professional_accounting
2471
+ - mmlu_nutrition
2472
+ - mmlu_miscellaneous
2473
+ - mmlu_medical_genetics
2474
+ - mmlu_marketing
2475
+ - mmlu_management
2476
+ - mmlu_human_aging
2477
+ - mmlu_global_facts
2478
+ - mmlu_college_medicine
2479
+ mmlu_social_sciences:
2480
+ - mmlu_us_foreign_policy
2481
+ - mmlu_sociology
2482
+ - mmlu_security_studies
2483
+ - mmlu_public_relations
2484
+ - mmlu_professional_psychology
2485
+ - mmlu_human_sexuality
2486
+ - mmlu_high_school_psychology
2487
+ - mmlu_high_school_microeconomics
2488
+ - mmlu_high_school_macroeconomics
2489
+ - mmlu_high_school_government_and_politics
2490
+ - mmlu_high_school_geography
2491
+ - mmlu_econometrics
2492
+ mmlu_humanities:
2493
+ - mmlu_world_religions
2494
+ - mmlu_professional_law
2495
+ - mmlu_prehistory
2496
+ - mmlu_philosophy
2497
+ - mmlu_moral_scenarios
2498
+ - mmlu_moral_disputes
2499
+ - mmlu_logical_fallacies
2500
+ - mmlu_jurisprudence
2501
+ - mmlu_international_law
2502
+ - mmlu_high_school_world_history
2503
+ - mmlu_high_school_us_history
2504
+ - mmlu_high_school_european_history
2505
+ - mmlu_formal_logic
2506
+ mmlu:
2507
+ - mmlu_humanities
2508
+ - mmlu_social_sciences
2509
+ - mmlu_other
2510
+ - mmlu_stem
2511
+ configs:
2512
+ mmlu_abstract_algebra:
2513
+ task: mmlu_abstract_algebra
2514
+ task_alias: abstract_algebra
2515
+ group: mmlu_stem
2516
+ group_alias: stem
2517
+ dataset_path: hails/mmlu_no_train
2518
+ dataset_name: abstract_algebra
2519
+ test_split: test
2520
+ fewshot_split: dev
2521
+ doc_to_text: '{{question.strip()}}
2522
+
2523
+ A. {{choices[0]}}
2524
+
2525
+ B. {{choices[1]}}
2526
+
2527
+ C. {{choices[2]}}
2528
+
2529
+ D. {{choices[3]}}
2530
+
2531
+ Answer:'
2532
+ doc_to_target: answer
2533
+ doc_to_choice:
2534
+ - A
2535
+ - B
2536
+ - C
2537
+ - D
2538
+ description: 'The following are multiple choice questions (with answers)
2539
+ about abstract algebra.
2540
+
2541
+
2542
+ '
2543
+ target_delimiter: ' '
2544
+ fewshot_delimiter: '
2545
+
2546
+
2547
+ '
2548
+ fewshot_config:
2549
+ sampler: first_n
2550
+ metric_list:
2551
+ - metric: acc
2552
+ aggregation: mean
2553
+ higher_is_better: true
2554
+ output_type: multiple_choice
2555
+ repeats: 1
2556
+ should_decontaminate: false
2557
+ metadata:
2558
+ version: 0.0
2559
+ mmlu_anatomy:
2560
+ task: mmlu_anatomy
2561
+ task_alias: anatomy
2562
+ group: mmlu_stem
2563
+ group_alias: stem
2564
+ dataset_path: hails/mmlu_no_train
2565
+ dataset_name: anatomy
2566
+ test_split: test
2567
+ fewshot_split: dev
2568
+ doc_to_text: '{{question.strip()}}
2569
+
2570
+ A. {{choices[0]}}
2571
+
2572
+ B. {{choices[1]}}
2573
+
2574
+ C. {{choices[2]}}
2575
+
2576
+ D. {{choices[3]}}
2577
+
2578
+ Answer:'
2579
+ doc_to_target: answer
2580
+ doc_to_choice:
2581
+ - A
2582
+ - B
2583
+ - C
2584
+ - D
2585
+ description: 'The following are multiple choice questions (with answers)
2586
+ about anatomy.
2587
+
2588
+
2589
+ '
2590
+ target_delimiter: ' '
2591
+ fewshot_delimiter: '
2592
+
2593
+
2594
+ '
2595
+ fewshot_config:
2596
+ sampler: first_n
2597
+ metric_list:
2598
+ - metric: acc
2599
+ aggregation: mean
2600
+ higher_is_better: true
2601
+ output_type: multiple_choice
2602
+ repeats: 1
2603
+ should_decontaminate: false
2604
+ metadata:
2605
+ version: 0.0
2606
+ mmlu_astronomy:
2607
+ task: mmlu_astronomy
2608
+ task_alias: astronomy
2609
+ group: mmlu_stem
2610
+ group_alias: stem
2611
+ dataset_path: hails/mmlu_no_train
2612
+ dataset_name: astronomy
2613
+ test_split: test
2614
+ fewshot_split: dev
2615
+ doc_to_text: '{{question.strip()}}
2616
+
2617
+ A. {{choices[0]}}
2618
+
2619
+ B. {{choices[1]}}
2620
+
2621
+ C. {{choices[2]}}
2622
+
2623
+ D. {{choices[3]}}
2624
+
2625
+ Answer:'
2626
+ doc_to_target: answer
2627
+ doc_to_choice:
2628
+ - A
2629
+ - B
2630
+ - C
2631
+ - D
2632
+ description: 'The following are multiple choice questions (with answers)
2633
+ about astronomy.
2634
+
2635
+
2636
+ '
2637
+ target_delimiter: ' '
2638
+ fewshot_delimiter: '
2639
+
2640
+
2641
+ '
2642
+ fewshot_config:
2643
+ sampler: first_n
2644
+ metric_list:
2645
+ - metric: acc
2646
+ aggregation: mean
2647
+ higher_is_better: true
2648
+ output_type: multiple_choice
2649
+ repeats: 1
2650
+ should_decontaminate: false
2651
+ metadata:
2652
+ version: 0.0
2653
+ mmlu_business_ethics:
2654
+ task: mmlu_business_ethics
2655
+ task_alias: business_ethics
2656
+ group: mmlu_other
2657
+ group_alias: other
2658
+ dataset_path: hails/mmlu_no_train
2659
+ dataset_name: business_ethics
2660
+ test_split: test
2661
+ fewshot_split: dev
2662
+ doc_to_text: '{{question.strip()}}
2663
+
2664
+ A. {{choices[0]}}
2665
+
2666
+ B. {{choices[1]}}
2667
+
2668
+ C. {{choices[2]}}
2669
+
2670
+ D. {{choices[3]}}
2671
+
2672
+ Answer:'
2673
+ doc_to_target: answer
2674
+ doc_to_choice:
2675
+ - A
2676
+ - B
2677
+ - C
2678
+ - D
2679
+ description: 'The following are multiple choice questions (with answers)
2680
+ about business ethics.
2681
+
2682
+
2683
+ '
2684
+ target_delimiter: ' '
2685
+ fewshot_delimiter: '
2686
+
2687
+
2688
+ '
2689
+ fewshot_config:
2690
+ sampler: first_n
2691
+ metric_list:
2692
+ - metric: acc
2693
+ aggregation: mean
2694
+ higher_is_better: true
2695
+ output_type: multiple_choice
2696
+ repeats: 1
2697
+ should_decontaminate: false
2698
+ metadata:
2699
+ version: 0.0
2700
+ mmlu_clinical_knowledge:
2701
+ task: mmlu_clinical_knowledge
2702
+ task_alias: clinical_knowledge
2703
+ group: mmlu_other
2704
+ group_alias: other
2705
+ dataset_path: hails/mmlu_no_train
2706
+ dataset_name: clinical_knowledge
2707
+ test_split: test
2708
+ fewshot_split: dev
2709
+ doc_to_text: '{{question.strip()}}
2710
+
2711
+ A. {{choices[0]}}
2712
+
2713
+ B. {{choices[1]}}
2714
+
2715
+ C. {{choices[2]}}
2716
+
2717
+ D. {{choices[3]}}
2718
+
2719
+ Answer:'
2720
+ doc_to_target: answer
2721
+ doc_to_choice:
2722
+ - A
2723
+ - B
2724
+ - C
2725
+ - D
2726
+ description: 'The following are multiple choice questions (with answers)
2727
+ about clinical knowledge.
2728
+
2729
+
2730
+ '
2731
+ target_delimiter: ' '
2732
+ fewshot_delimiter: '
2733
+
2734
+
2735
+ '
2736
+ fewshot_config:
2737
+ sampler: first_n
2738
+ metric_list:
2739
+ - metric: acc
2740
+ aggregation: mean
2741
+ higher_is_better: true
2742
+ output_type: multiple_choice
2743
+ repeats: 1
2744
+ should_decontaminate: false
2745
+ metadata:
2746
+ version: 0.0
2747
+ mmlu_college_biology:
2748
+ task: mmlu_college_biology
2749
+ task_alias: college_biology
2750
+ group: mmlu_stem
2751
+ group_alias: stem
2752
+ dataset_path: hails/mmlu_no_train
2753
+ dataset_name: college_biology
2754
+ test_split: test
2755
+ fewshot_split: dev
2756
+ doc_to_text: '{{question.strip()}}
2757
+
2758
+ A. {{choices[0]}}
2759
+
2760
+ B. {{choices[1]}}
2761
+
2762
+ C. {{choices[2]}}
2763
+
2764
+ D. {{choices[3]}}
2765
+
2766
+ Answer:'
2767
+ doc_to_target: answer
2768
+ doc_to_choice:
2769
+ - A
2770
+ - B
2771
+ - C
2772
+ - D
2773
+ description: 'The following are multiple choice questions (with answers)
2774
+ about college biology.
2775
+
2776
+
2777
+ '
2778
+ target_delimiter: ' '
2779
+ fewshot_delimiter: '
2780
+
2781
+
2782
+ '
2783
+ fewshot_config:
2784
+ sampler: first_n
2785
+ metric_list:
2786
+ - metric: acc
2787
+ aggregation: mean
2788
+ higher_is_better: true
2789
+ output_type: multiple_choice
2790
+ repeats: 1
2791
+ should_decontaminate: false
2792
+ metadata:
2793
+ version: 0.0
2794
+ mmlu_college_chemistry:
2795
+ task: mmlu_college_chemistry
2796
+ task_alias: college_chemistry
2797
+ group: mmlu_stem
2798
+ group_alias: stem
2799
+ dataset_path: hails/mmlu_no_train
2800
+ dataset_name: college_chemistry
2801
+ test_split: test
2802
+ fewshot_split: dev
2803
+ doc_to_text: '{{question.strip()}}
2804
+
2805
+ A. {{choices[0]}}
2806
+
2807
+ B. {{choices[1]}}
2808
+
2809
+ C. {{choices[2]}}
2810
+
2811
+ D. {{choices[3]}}
2812
+
2813
+ Answer:'
2814
+ doc_to_target: answer
2815
+ doc_to_choice:
2816
+ - A
2817
+ - B
2818
+ - C
2819
+ - D
2820
+ description: 'The following are multiple choice questions (with answers)
2821
+ about college chemistry.
2822
+
2823
+
2824
+ '
2825
+ target_delimiter: ' '
2826
+ fewshot_delimiter: '
2827
+
2828
+
2829
+ '
2830
+ fewshot_config:
2831
+ sampler: first_n
2832
+ metric_list:
2833
+ - metric: acc
2834
+ aggregation: mean
2835
+ higher_is_better: true
2836
+ output_type: multiple_choice
2837
+ repeats: 1
2838
+ should_decontaminate: false
2839
+ metadata:
2840
+ version: 0.0
2841
+ mmlu_college_computer_science:
2842
+ task: mmlu_college_computer_science
2843
+ task_alias: college_computer_science
2844
+ group: mmlu_stem
2845
+ group_alias: stem
2846
+ dataset_path: hails/mmlu_no_train
2847
+ dataset_name: college_computer_science
2848
+ test_split: test
2849
+ fewshot_split: dev
2850
+ doc_to_text: '{{question.strip()}}
2851
+
2852
+ A. {{choices[0]}}
2853
+
2854
+ B. {{choices[1]}}
2855
+
2856
+ C. {{choices[2]}}
2857
+
2858
+ D. {{choices[3]}}
2859
+
2860
+ Answer:'
2861
+ doc_to_target: answer
2862
+ doc_to_choice:
2863
+ - A
2864
+ - B
2865
+ - C
2866
+ - D
2867
+ description: 'The following are multiple choice questions (with answers)
2868
+ about college computer science.
2869
+
2870
+
2871
+ '
2872
+ target_delimiter: ' '
2873
+ fewshot_delimiter: '
2874
+
2875
+
2876
+ '
2877
+ fewshot_config:
2878
+ sampler: first_n
2879
+ metric_list:
2880
+ - metric: acc
2881
+ aggregation: mean
2882
+ higher_is_better: true
2883
+ output_type: multiple_choice
2884
+ repeats: 1
2885
+ should_decontaminate: false
2886
+ metadata:
2887
+ version: 0.0
2888
+ mmlu_college_mathematics:
2889
+ task: mmlu_college_mathematics
2890
+ task_alias: college_mathematics
2891
+ group: mmlu_stem
2892
+ group_alias: stem
2893
+ dataset_path: hails/mmlu_no_train
2894
+ dataset_name: college_mathematics
2895
+ test_split: test
2896
+ fewshot_split: dev
2897
+ doc_to_text: '{{question.strip()}}
2898
+
2899
+ A. {{choices[0]}}
2900
+
2901
+ B. {{choices[1]}}
2902
+
2903
+ C. {{choices[2]}}
2904
+
2905
+ D. {{choices[3]}}
2906
+
2907
+ Answer:'
2908
+ doc_to_target: answer
2909
+ doc_to_choice:
2910
+ - A
2911
+ - B
2912
+ - C
2913
+ - D
2914
+ description: 'The following are multiple choice questions (with answers)
2915
+ about college mathematics.
2916
+
2917
+
2918
+ '
2919
+ target_delimiter: ' '
2920
+ fewshot_delimiter: '
2921
+
2922
+
2923
+ '
2924
+ fewshot_config:
2925
+ sampler: first_n
2926
+ metric_list:
2927
+ - metric: acc
2928
+ aggregation: mean
2929
+ higher_is_better: true
2930
+ output_type: multiple_choice
2931
+ repeats: 1
2932
+ should_decontaminate: false
2933
+ metadata:
2934
+ version: 0.0
2935
+ mmlu_college_medicine:
2936
+ task: mmlu_college_medicine
2937
+ task_alias: college_medicine
2938
+ group: mmlu_other
2939
+ group_alias: other
2940
+ dataset_path: hails/mmlu_no_train
2941
+ dataset_name: college_medicine
2942
+ test_split: test
2943
+ fewshot_split: dev
2944
+ doc_to_text: '{{question.strip()}}
2945
+
2946
+ A. {{choices[0]}}
2947
+
2948
+ B. {{choices[1]}}
2949
+
2950
+ C. {{choices[2]}}
2951
+
2952
+ D. {{choices[3]}}
2953
+
2954
+ Answer:'
2955
+ doc_to_target: answer
2956
+ doc_to_choice:
2957
+ - A
2958
+ - B
2959
+ - C
2960
+ - D
2961
+ description: 'The following are multiple choice questions (with answers)
2962
+ about college medicine.
2963
+
2964
+
2965
+ '
2966
+ target_delimiter: ' '
2967
+ fewshot_delimiter: '
2968
+
2969
+
2970
+ '
2971
+ fewshot_config:
2972
+ sampler: first_n
2973
+ metric_list:
2974
+ - metric: acc
2975
+ aggregation: mean
2976
+ higher_is_better: true
2977
+ output_type: multiple_choice
2978
+ repeats: 1
2979
+ should_decontaminate: false
2980
+ metadata:
2981
+ version: 0.0
2982
+ mmlu_college_physics:
2983
+ task: mmlu_college_physics
2984
+ task_alias: college_physics
2985
+ group: mmlu_stem
2986
+ group_alias: stem
2987
+ dataset_path: hails/mmlu_no_train
2988
+ dataset_name: college_physics
2989
+ test_split: test
2990
+ fewshot_split: dev
2991
+ doc_to_text: '{{question.strip()}}
2992
+
2993
+ A. {{choices[0]}}
2994
+
2995
+ B. {{choices[1]}}
2996
+
2997
+ C. {{choices[2]}}
2998
+
2999
+ D. {{choices[3]}}
3000
+
3001
+ Answer:'
3002
+ doc_to_target: answer
3003
+ doc_to_choice:
3004
+ - A
3005
+ - B
3006
+ - C
3007
+ - D
3008
+ description: 'The following are multiple choice questions (with answers)
3009
+ about college physics.
3010
+
3011
+
3012
+ '
3013
+ target_delimiter: ' '
3014
+ fewshot_delimiter: '
3015
+
3016
+
3017
+ '
3018
+ fewshot_config:
3019
+ sampler: first_n
3020
+ metric_list:
3021
+ - metric: acc
3022
+ aggregation: mean
3023
+ higher_is_better: true
3024
+ output_type: multiple_choice
3025
+ repeats: 1
3026
+ should_decontaminate: false
3027
+ metadata:
3028
+ version: 0.0
3029
+ mmlu_computer_security:
3030
+ task: mmlu_computer_security
3031
+ task_alias: computer_security
3032
+ group: mmlu_stem
3033
+ group_alias: stem
3034
+ dataset_path: hails/mmlu_no_train
3035
+ dataset_name: computer_security
3036
+ test_split: test
3037
+ fewshot_split: dev
3038
+ doc_to_text: '{{question.strip()}}
3039
+
3040
+ A. {{choices[0]}}
3041
+
3042
+ B. {{choices[1]}}
3043
+
3044
+ C. {{choices[2]}}
3045
+
3046
+ D. {{choices[3]}}
3047
+
3048
+ Answer:'
3049
+ doc_to_target: answer
3050
+ doc_to_choice:
3051
+ - A
3052
+ - B
3053
+ - C
3054
+ - D
3055
+ description: 'The following are multiple choice questions (with answers)
3056
+ about computer security.
3057
+
3058
+
3059
+ '
3060
+ target_delimiter: ' '
3061
+ fewshot_delimiter: '
3062
+
3063
+
3064
+ '
3065
+ fewshot_config:
3066
+ sampler: first_n
3067
+ metric_list:
3068
+ - metric: acc
3069
+ aggregation: mean
3070
+ higher_is_better: true
3071
+ output_type: multiple_choice
3072
+ repeats: 1
3073
+ should_decontaminate: false
3074
+ metadata:
3075
+ version: 0.0
3076
+ mmlu_conceptual_physics:
3077
+ task: mmlu_conceptual_physics
3078
+ task_alias: conceptual_physics
3079
+ group: mmlu_stem
3080
+ group_alias: stem
3081
+ dataset_path: hails/mmlu_no_train
3082
+ dataset_name: conceptual_physics
3083
+ test_split: test
3084
+ fewshot_split: dev
3085
+ doc_to_text: '{{question.strip()}}
3086
+
3087
+ A. {{choices[0]}}
3088
+
3089
+ B. {{choices[1]}}
3090
+
3091
+ C. {{choices[2]}}
3092
+
3093
+ D. {{choices[3]}}
3094
+
3095
+ Answer:'
3096
+ doc_to_target: answer
3097
+ doc_to_choice:
3098
+ - A
3099
+ - B
3100
+ - C
3101
+ - D
3102
+ description: 'The following are multiple choice questions (with answers)
3103
+ about conceptual physics.
3104
+
3105
+
3106
+ '
3107
+ target_delimiter: ' '
3108
+ fewshot_delimiter: '
3109
+
3110
+
3111
+ '
3112
+ fewshot_config:
3113
+ sampler: first_n
3114
+ metric_list:
3115
+ - metric: acc
3116
+ aggregation: mean
3117
+ higher_is_better: true
3118
+ output_type: multiple_choice
3119
+ repeats: 1
3120
+ should_decontaminate: false
3121
+ metadata:
3122
+ version: 0.0
3123
+ mmlu_econometrics:
3124
+ task: mmlu_econometrics
3125
+ task_alias: econometrics
3126
+ group: mmlu_social_sciences
3127
+ group_alias: social_sciences
3128
+ dataset_path: hails/mmlu_no_train
3129
+ dataset_name: econometrics
3130
+ test_split: test
3131
+ fewshot_split: dev
3132
+ doc_to_text: '{{question.strip()}}
3133
+
3134
+ A. {{choices[0]}}
3135
+
3136
+ B. {{choices[1]}}
3137
+
3138
+ C. {{choices[2]}}
3139
+
3140
+ D. {{choices[3]}}
3141
+
3142
+ Answer:'
3143
+ doc_to_target: answer
3144
+ doc_to_choice:
3145
+ - A
3146
+ - B
3147
+ - C
3148
+ - D
3149
+ description: 'The following are multiple choice questions (with answers)
3150
+ about econometrics.
3151
+
3152
+
3153
+ '
3154
+ target_delimiter: ' '
3155
+ fewshot_delimiter: '
3156
+
3157
+
3158
+ '
3159
+ fewshot_config:
3160
+ sampler: first_n
3161
+ metric_list:
3162
+ - metric: acc
3163
+ aggregation: mean
3164
+ higher_is_better: true
3165
+ output_type: multiple_choice
3166
+ repeats: 1
3167
+ should_decontaminate: false
3168
+ metadata:
3169
+ version: 0.0
3170
+ mmlu_electrical_engineering:
3171
+ task: mmlu_electrical_engineering
3172
+ task_alias: electrical_engineering
3173
+ group: mmlu_stem
3174
+ group_alias: stem
3175
+ dataset_path: hails/mmlu_no_train
3176
+ dataset_name: electrical_engineering
3177
+ test_split: test
3178
+ fewshot_split: dev
3179
+ doc_to_text: '{{question.strip()}}
3180
+
3181
+ A. {{choices[0]}}
3182
+
3183
+ B. {{choices[1]}}
3184
+
3185
+ C. {{choices[2]}}
3186
+
3187
+ D. {{choices[3]}}
3188
+
3189
+ Answer:'
3190
+ doc_to_target: answer
3191
+ doc_to_choice:
3192
+ - A
3193
+ - B
3194
+ - C
3195
+ - D
3196
+ description: 'The following are multiple choice questions (with answers)
3197
+ about electrical engineering.
3198
+
3199
+
3200
+ '
3201
+ target_delimiter: ' '
3202
+ fewshot_delimiter: '
3203
+
3204
+
3205
+ '
3206
+ fewshot_config:
3207
+ sampler: first_n
3208
+ metric_list:
3209
+ - metric: acc
3210
+ aggregation: mean
3211
+ higher_is_better: true
3212
+ output_type: multiple_choice
3213
+ repeats: 1
3214
+ should_decontaminate: false
3215
+ metadata:
3216
+ version: 0.0
3217
+ mmlu_elementary_mathematics:
3218
+ task: mmlu_elementary_mathematics
3219
+ task_alias: elementary_mathematics
3220
+ group: mmlu_stem
3221
+ group_alias: stem
3222
+ dataset_path: hails/mmlu_no_train
3223
+ dataset_name: elementary_mathematics
3224
+ test_split: test
3225
+ fewshot_split: dev
3226
+ doc_to_text: '{{question.strip()}}
3227
+
3228
+ A. {{choices[0]}}
3229
+
3230
+ B. {{choices[1]}}
3231
+
3232
+ C. {{choices[2]}}
3233
+
3234
+ D. {{choices[3]}}
3235
+
3236
+ Answer:'
3237
+ doc_to_target: answer
3238
+ doc_to_choice:
3239
+ - A
3240
+ - B
3241
+ - C
3242
+ - D
3243
+ description: 'The following are multiple choice questions (with answers)
3244
+ about elementary mathematics.
3245
+
3246
+
3247
+ '
3248
+ target_delimiter: ' '
3249
+ fewshot_delimiter: '
3250
+
3251
+
3252
+ '
3253
+ fewshot_config:
3254
+ sampler: first_n
3255
+ metric_list:
3256
+ - metric: acc
3257
+ aggregation: mean
3258
+ higher_is_better: true
3259
+ output_type: multiple_choice
3260
+ repeats: 1
3261
+ should_decontaminate: false
3262
+ metadata:
3263
+ version: 0.0
3264
+ mmlu_formal_logic:
3265
+ task: mmlu_formal_logic
3266
+ task_alias: formal_logic
3267
+ group: mmlu_humanities
3268
+ group_alias: humanities
3269
+ dataset_path: hails/mmlu_no_train
3270
+ dataset_name: formal_logic
3271
+ test_split: test
3272
+ fewshot_split: dev
3273
+ doc_to_text: '{{question.strip()}}
3274
+
3275
+ A. {{choices[0]}}
3276
+
3277
+ B. {{choices[1]}}
3278
+
3279
+ C. {{choices[2]}}
3280
+
3281
+ D. {{choices[3]}}
3282
+
3283
+ Answer:'
3284
+ doc_to_target: answer
3285
+ doc_to_choice:
3286
+ - A
3287
+ - B
3288
+ - C
3289
+ - D
3290
+ description: 'The following are multiple choice questions (with answers)
3291
+ about formal logic.
3292
+
3293
+
3294
+ '
3295
+ target_delimiter: ' '
3296
+ fewshot_delimiter: '
3297
+
3298
+
3299
+ '
3300
+ fewshot_config:
3301
+ sampler: first_n
3302
+ metric_list:
3303
+ - metric: acc
3304
+ aggregation: mean
3305
+ higher_is_better: true
3306
+ output_type: multiple_choice
3307
+ repeats: 1
3308
+ should_decontaminate: false
3309
+ metadata:
3310
+ version: 0.0
3311
+ mmlu_global_facts:
3312
+ task: mmlu_global_facts
3313
+ task_alias: global_facts
3314
+ group: mmlu_other
3315
+ group_alias: other
3316
+ dataset_path: hails/mmlu_no_train
3317
+ dataset_name: global_facts
3318
+ test_split: test
3319
+ fewshot_split: dev
3320
+ doc_to_text: '{{question.strip()}}
3321
+
3322
+ A. {{choices[0]}}
3323
+
3324
+ B. {{choices[1]}}
3325
+
3326
+ C. {{choices[2]}}
3327
+
3328
+ D. {{choices[3]}}
3329
+
3330
+ Answer:'
3331
+ doc_to_target: answer
3332
+ doc_to_choice:
3333
+ - A
3334
+ - B
3335
+ - C
3336
+ - D
3337
+ description: 'The following are multiple choice questions (with answers)
3338
+ about global facts.
3339
+
3340
+
3341
+ '
3342
+ target_delimiter: ' '
3343
+ fewshot_delimiter: '
3344
+
3345
+
3346
+ '
3347
+ fewshot_config:
3348
+ sampler: first_n
3349
+ metric_list:
3350
+ - metric: acc
3351
+ aggregation: mean
3352
+ higher_is_better: true
3353
+ output_type: multiple_choice
3354
+ repeats: 1
3355
+ should_decontaminate: false
3356
+ metadata:
3357
+ version: 0.0
3358
+ mmlu_high_school_biology:
3359
+ task: mmlu_high_school_biology
3360
+ task_alias: high_school_biology
3361
+ group: mmlu_stem
3362
+ group_alias: stem
3363
+ dataset_path: hails/mmlu_no_train
3364
+ dataset_name: high_school_biology
3365
+ test_split: test
3366
+ fewshot_split: dev
3367
+ doc_to_text: '{{question.strip()}}
3368
+
3369
+ A. {{choices[0]}}
3370
+
3371
+ B. {{choices[1]}}
3372
+
3373
+ C. {{choices[2]}}
3374
+
3375
+ D. {{choices[3]}}
3376
+
3377
+ Answer:'
3378
+ doc_to_target: answer
3379
+ doc_to_choice:
3380
+ - A
3381
+ - B
3382
+ - C
3383
+ - D
3384
+ description: 'The following are multiple choice questions (with answers)
3385
+ about high school biology.
3386
+
3387
+
3388
+ '
3389
+ target_delimiter: ' '
3390
+ fewshot_delimiter: '
3391
+
3392
+
3393
+ '
3394
+ fewshot_config:
3395
+ sampler: first_n
3396
+ metric_list:
3397
+ - metric: acc
3398
+ aggregation: mean
3399
+ higher_is_better: true
3400
+ output_type: multiple_choice
3401
+ repeats: 1
3402
+ should_decontaminate: false
3403
+ metadata:
3404
+ version: 0.0
3405
+ mmlu_high_school_chemistry:
3406
+ task: mmlu_high_school_chemistry
3407
+ task_alias: high_school_chemistry
3408
+ group: mmlu_stem
3409
+ group_alias: stem
3410
+ dataset_path: hails/mmlu_no_train
3411
+ dataset_name: high_school_chemistry
3412
+ test_split: test
3413
+ fewshot_split: dev
3414
+ doc_to_text: '{{question.strip()}}
3415
+
3416
+ A. {{choices[0]}}
3417
+
3418
+ B. {{choices[1]}}
3419
+
3420
+ C. {{choices[2]}}
3421
+
3422
+ D. {{choices[3]}}
3423
+
3424
+ Answer:'
3425
+ doc_to_target: answer
3426
+ doc_to_choice:
3427
+ - A
3428
+ - B
3429
+ - C
3430
+ - D
3431
+ description: 'The following are multiple choice questions (with answers)
3432
+ about high school chemistry.
3433
+
3434
+
3435
+ '
3436
+ target_delimiter: ' '
3437
+ fewshot_delimiter: '
3438
+
3439
+
3440
+ '
3441
+ fewshot_config:
3442
+ sampler: first_n
3443
+ metric_list:
3444
+ - metric: acc
3445
+ aggregation: mean
3446
+ higher_is_better: true
3447
+ output_type: multiple_choice
3448
+ repeats: 1
3449
+ should_decontaminate: false
3450
+ metadata:
3451
+ version: 0.0
3452
+ mmlu_high_school_computer_science:
3453
+ task: mmlu_high_school_computer_science
3454
+ task_alias: high_school_computer_science
3455
+ group: mmlu_stem
3456
+ group_alias: stem
3457
+ dataset_path: hails/mmlu_no_train
3458
+ dataset_name: high_school_computer_science
3459
+ test_split: test
3460
+ fewshot_split: dev
3461
+ doc_to_text: '{{question.strip()}}
3462
+
3463
+ A. {{choices[0]}}
3464
+
3465
+ B. {{choices[1]}}
3466
+
3467
+ C. {{choices[2]}}
3468
+
3469
+ D. {{choices[3]}}
3470
+
3471
+ Answer:'
3472
+ doc_to_target: answer
3473
+ doc_to_choice:
3474
+ - A
3475
+ - B
3476
+ - C
3477
+ - D
3478
+ description: 'The following are multiple choice questions (with answers)
3479
+ about high school computer science.
3480
+
3481
+
3482
+ '
3483
+ target_delimiter: ' '
3484
+ fewshot_delimiter: '
3485
+
3486
+
3487
+ '
3488
+ fewshot_config:
3489
+ sampler: first_n
3490
+ metric_list:
3491
+ - metric: acc
3492
+ aggregation: mean
3493
+ higher_is_better: true
3494
+ output_type: multiple_choice
3495
+ repeats: 1
3496
+ should_decontaminate: false
3497
+ metadata:
3498
+ version: 0.0
3499
+ mmlu_high_school_european_history:
3500
+ task: mmlu_high_school_european_history
3501
+ task_alias: high_school_european_history
3502
+ group: mmlu_humanities
3503
+ group_alias: humanities
3504
+ dataset_path: hails/mmlu_no_train
3505
+ dataset_name: high_school_european_history
3506
+ test_split: test
3507
+ fewshot_split: dev
3508
+ doc_to_text: '{{question.strip()}}
3509
+
3510
+ A. {{choices[0]}}
3511
+
3512
+ B. {{choices[1]}}
3513
+
3514
+ C. {{choices[2]}}
3515
+
3516
+ D. {{choices[3]}}
3517
+
3518
+ Answer:'
3519
+ doc_to_target: answer
3520
+ doc_to_choice:
3521
+ - A
3522
+ - B
3523
+ - C
3524
+ - D
3525
+ description: 'The following are multiple choice questions (with answers)
3526
+ about high school european history.
3527
+
3528
+
3529
+ '
3530
+ target_delimiter: ' '
3531
+ fewshot_delimiter: '
3532
+
3533
+
3534
+ '
3535
+ fewshot_config:
3536
+ sampler: first_n
3537
+ metric_list:
3538
+ - metric: acc
3539
+ aggregation: mean
3540
+ higher_is_better: true
3541
+ output_type: multiple_choice
3542
+ repeats: 1
3543
+ should_decontaminate: false
3544
+ metadata:
3545
+ version: 0.0
3546
+ mmlu_high_school_geography:
3547
+ task: mmlu_high_school_geography
3548
+ task_alias: high_school_geography
3549
+ group: mmlu_social_sciences
3550
+ group_alias: social_sciences
3551
+ dataset_path: hails/mmlu_no_train
3552
+ dataset_name: high_school_geography
3553
+ test_split: test
3554
+ fewshot_split: dev
3555
+ doc_to_text: '{{question.strip()}}
3556
+
3557
+ A. {{choices[0]}}
3558
+
3559
+ B. {{choices[1]}}
3560
+
3561
+ C. {{choices[2]}}
3562
+
3563
+ D. {{choices[3]}}
3564
+
3565
+ Answer:'
3566
+ doc_to_target: answer
3567
+ doc_to_choice:
3568
+ - A
3569
+ - B
3570
+ - C
3571
+ - D
3572
+ description: 'The following are multiple choice questions (with answers)
3573
+ about high school geography.
3574
+
3575
+
3576
+ '
3577
+ target_delimiter: ' '
3578
+ fewshot_delimiter: '
3579
+
3580
+
3581
+ '
3582
+ fewshot_config:
3583
+ sampler: first_n
3584
+ metric_list:
3585
+ - metric: acc
3586
+ aggregation: mean
3587
+ higher_is_better: true
3588
+ output_type: multiple_choice
3589
+ repeats: 1
3590
+ should_decontaminate: false
3591
+ metadata:
3592
+ version: 0.0
3593
+ mmlu_high_school_government_and_politics:
3594
+ task: mmlu_high_school_government_and_politics
3595
+ task_alias: high_school_government_and_politics
3596
+ group: mmlu_social_sciences
3597
+ group_alias: social_sciences
3598
+ dataset_path: hails/mmlu_no_train
3599
+ dataset_name: high_school_government_and_politics
3600
+ test_split: test
3601
+ fewshot_split: dev
3602
+ doc_to_text: '{{question.strip()}}
3603
+
3604
+ A. {{choices[0]}}
3605
+
3606
+ B. {{choices[1]}}
3607
+
3608
+ C. {{choices[2]}}
3609
+
3610
+ D. {{choices[3]}}
3611
+
3612
+ Answer:'
3613
+ doc_to_target: answer
3614
+ doc_to_choice:
3615
+ - A
3616
+ - B
3617
+ - C
3618
+ - D
3619
+ description: 'The following are multiple choice questions (with answers)
3620
+ about high school government and politics.
3621
+
3622
+
3623
+ '
3624
+ target_delimiter: ' '
3625
+ fewshot_delimiter: '
3626
+
3627
+
3628
+ '
3629
+ fewshot_config:
3630
+ sampler: first_n
3631
+ metric_list:
3632
+ - metric: acc
3633
+ aggregation: mean
3634
+ higher_is_better: true
3635
+ output_type: multiple_choice
3636
+ repeats: 1
3637
+ should_decontaminate: false
3638
+ metadata:
3639
+ version: 0.0
3640
+ mmlu_high_school_macroeconomics:
3641
+ task: mmlu_high_school_macroeconomics
3642
+ task_alias: high_school_macroeconomics
3643
+ group: mmlu_social_sciences
3644
+ group_alias: social_sciences
3645
+ dataset_path: hails/mmlu_no_train
3646
+ dataset_name: high_school_macroeconomics
3647
+ test_split: test
3648
+ fewshot_split: dev
3649
+ doc_to_text: '{{question.strip()}}
3650
+
3651
+ A. {{choices[0]}}
3652
+
3653
+ B. {{choices[1]}}
3654
+
3655
+ C. {{choices[2]}}
3656
+
3657
+ D. {{choices[3]}}
3658
+
3659
+ Answer:'
3660
+ doc_to_target: answer
3661
+ doc_to_choice:
3662
+ - A
3663
+ - B
3664
+ - C
3665
+ - D
3666
+ description: 'The following are multiple choice questions (with answers)
3667
+ about high school macroeconomics.
3668
+
3669
+
3670
+ '
3671
+ target_delimiter: ' '
3672
+ fewshot_delimiter: '
3673
+
3674
+
3675
+ '
3676
+ fewshot_config:
3677
+ sampler: first_n
3678
+ metric_list:
3679
+ - metric: acc
3680
+ aggregation: mean
3681
+ higher_is_better: true
3682
+ output_type: multiple_choice
3683
+ repeats: 1
3684
+ should_decontaminate: false
3685
+ metadata:
3686
+ version: 0.0
3687
+ mmlu_high_school_mathematics:
3688
+ task: mmlu_high_school_mathematics
3689
+ task_alias: high_school_mathematics
3690
+ group: mmlu_stem
3691
+ group_alias: stem
3692
+ dataset_path: hails/mmlu_no_train
3693
+ dataset_name: high_school_mathematics
3694
+ test_split: test
3695
+ fewshot_split: dev
3696
+ doc_to_text: '{{question.strip()}}
3697
+
3698
+ A. {{choices[0]}}
3699
+
3700
+ B. {{choices[1]}}
3701
+
3702
+ C. {{choices[2]}}
3703
+
3704
+ D. {{choices[3]}}
3705
+
3706
+ Answer:'
3707
+ doc_to_target: answer
3708
+ doc_to_choice:
3709
+ - A
3710
+ - B
3711
+ - C
3712
+ - D
3713
+ description: 'The following are multiple choice questions (with answers)
3714
+ about high school mathematics.
3715
+
3716
+
3717
+ '
3718
+ target_delimiter: ' '
3719
+ fewshot_delimiter: '
3720
+
3721
+
3722
+ '
3723
+ fewshot_config:
3724
+ sampler: first_n
3725
+ metric_list:
3726
+ - metric: acc
3727
+ aggregation: mean
3728
+ higher_is_better: true
3729
+ output_type: multiple_choice
3730
+ repeats: 1
3731
+ should_decontaminate: false
3732
+ metadata:
3733
+ version: 0.0
3734
+ mmlu_high_school_microeconomics:
3735
+ task: mmlu_high_school_microeconomics
3736
+ task_alias: high_school_microeconomics
3737
+ group: mmlu_social_sciences
3738
+ group_alias: social_sciences
3739
+ dataset_path: hails/mmlu_no_train
3740
+ dataset_name: high_school_microeconomics
3741
+ test_split: test
3742
+ fewshot_split: dev
3743
+ doc_to_text: '{{question.strip()}}
3744
+
3745
+ A. {{choices[0]}}
3746
+
3747
+ B. {{choices[1]}}
3748
+
3749
+ C. {{choices[2]}}
3750
+
3751
+ D. {{choices[3]}}
3752
+
3753
+ Answer:'
3754
+ doc_to_target: answer
3755
+ doc_to_choice:
3756
+ - A
3757
+ - B
3758
+ - C
3759
+ - D
3760
+ description: 'The following are multiple choice questions (with answers)
3761
+ about high school microeconomics.
3762
+
3763
+
3764
+ '
3765
+ target_delimiter: ' '
3766
+ fewshot_delimiter: '
3767
+
3768
+
3769
+ '
3770
+ fewshot_config:
3771
+ sampler: first_n
3772
+ metric_list:
3773
+ - metric: acc
3774
+ aggregation: mean
3775
+ higher_is_better: true
3776
+ output_type: multiple_choice
3777
+ repeats: 1
3778
+ should_decontaminate: false
3779
+ metadata:
3780
+ version: 0.0
3781
+ mmlu_high_school_physics:
3782
+ task: mmlu_high_school_physics
3783
+ task_alias: high_school_physics
3784
+ group: mmlu_stem
3785
+ group_alias: stem
3786
+ dataset_path: hails/mmlu_no_train
3787
+ dataset_name: high_school_physics
3788
+ test_split: test
3789
+ fewshot_split: dev
3790
+ doc_to_text: '{{question.strip()}}
3791
+
3792
+ A. {{choices[0]}}
3793
+
3794
+ B. {{choices[1]}}
3795
+
3796
+ C. {{choices[2]}}
3797
+
3798
+ D. {{choices[3]}}
3799
+
3800
+ Answer:'
3801
+ doc_to_target: answer
3802
+ doc_to_choice:
3803
+ - A
3804
+ - B
3805
+ - C
3806
+ - D
3807
+ description: 'The following are multiple choice questions (with answers)
3808
+ about high school physics.
3809
+
3810
+
3811
+ '
3812
+ target_delimiter: ' '
3813
+ fewshot_delimiter: '
3814
+
3815
+
3816
+ '
3817
+ fewshot_config:
3818
+ sampler: first_n
3819
+ metric_list:
3820
+ - metric: acc
3821
+ aggregation: mean
3822
+ higher_is_better: true
3823
+ output_type: multiple_choice
3824
+ repeats: 1
3825
+ should_decontaminate: false
3826
+ metadata:
3827
+ version: 0.0
3828
+ mmlu_high_school_psychology:
3829
+ task: mmlu_high_school_psychology
3830
+ task_alias: high_school_psychology
3831
+ group: mmlu_social_sciences
3832
+ group_alias: social_sciences
3833
+ dataset_path: hails/mmlu_no_train
3834
+ dataset_name: high_school_psychology
3835
+ test_split: test
3836
+ fewshot_split: dev
3837
+ doc_to_text: '{{question.strip()}}
3838
+
3839
+ A. {{choices[0]}}
3840
+
3841
+ B. {{choices[1]}}
3842
+
3843
+ C. {{choices[2]}}
3844
+
3845
+ D. {{choices[3]}}
3846
+
3847
+ Answer:'
3848
+ doc_to_target: answer
3849
+ doc_to_choice:
3850
+ - A
3851
+ - B
3852
+ - C
3853
+ - D
3854
+ description: 'The following are multiple choice questions (with answers)
3855
+ about high school psychology.
3856
+
3857
+
3858
+ '
3859
+ target_delimiter: ' '
3860
+ fewshot_delimiter: '
3861
+
3862
+
3863
+ '
3864
+ fewshot_config:
3865
+ sampler: first_n
3866
+ metric_list:
3867
+ - metric: acc
3868
+ aggregation: mean
3869
+ higher_is_better: true
3870
+ output_type: multiple_choice
3871
+ repeats: 1
3872
+ should_decontaminate: false
3873
+ metadata:
3874
+ version: 0.0
3875
+ mmlu_high_school_statistics:
3876
+ task: mmlu_high_school_statistics
3877
+ task_alias: high_school_statistics
3878
+ group: mmlu_stem
3879
+ group_alias: stem
3880
+ dataset_path: hails/mmlu_no_train
3881
+ dataset_name: high_school_statistics
3882
+ test_split: test
3883
+ fewshot_split: dev
3884
+ doc_to_text: '{{question.strip()}}
3885
+
3886
+ A. {{choices[0]}}
3887
+
3888
+ B. {{choices[1]}}
3889
+
3890
+ C. {{choices[2]}}
3891
+
3892
+ D. {{choices[3]}}
3893
+
3894
+ Answer:'
3895
+ doc_to_target: answer
3896
+ doc_to_choice:
3897
+ - A
3898
+ - B
3899
+ - C
3900
+ - D
3901
+ description: 'The following are multiple choice questions (with answers)
3902
+ about high school statistics.
3903
+
3904
+
3905
+ '
3906
+ target_delimiter: ' '
3907
+ fewshot_delimiter: '
3908
+
3909
+
3910
+ '
3911
+ fewshot_config:
3912
+ sampler: first_n
3913
+ metric_list:
3914
+ - metric: acc
3915
+ aggregation: mean
3916
+ higher_is_better: true
3917
+ output_type: multiple_choice
3918
+ repeats: 1
3919
+ should_decontaminate: false
3920
+ metadata:
3921
+ version: 0.0
3922
+ mmlu_high_school_us_history:
3923
+ task: mmlu_high_school_us_history
3924
+ task_alias: high_school_us_history
3925
+ group: mmlu_humanities
3926
+ group_alias: humanities
3927
+ dataset_path: hails/mmlu_no_train
3928
+ dataset_name: high_school_us_history
3929
+ test_split: test
3930
+ fewshot_split: dev
3931
+ doc_to_text: '{{question.strip()}}
3932
+
3933
+ A. {{choices[0]}}
3934
+
3935
+ B. {{choices[1]}}
3936
+
3937
+ C. {{choices[2]}}
3938
+
3939
+ D. {{choices[3]}}
3940
+
3941
+ Answer:'
3942
+ doc_to_target: answer
3943
+ doc_to_choice:
3944
+ - A
3945
+ - B
3946
+ - C
3947
+ - D
3948
+ description: 'The following are multiple choice questions (with answers)
3949
+ about high school us history.
3950
+
3951
+
3952
+ '
3953
+ target_delimiter: ' '
3954
+ fewshot_delimiter: '
3955
+
3956
+
3957
+ '
3958
+ fewshot_config:
3959
+ sampler: first_n
3960
+ metric_list:
3961
+ - metric: acc
3962
+ aggregation: mean
3963
+ higher_is_better: true
3964
+ output_type: multiple_choice
3965
+ repeats: 1
3966
+ should_decontaminate: false
3967
+ metadata:
3968
+ version: 0.0
3969
+ mmlu_high_school_world_history:
3970
+ task: mmlu_high_school_world_history
3971
+ task_alias: high_school_world_history
3972
+ group: mmlu_humanities
3973
+ group_alias: humanities
3974
+ dataset_path: hails/mmlu_no_train
3975
+ dataset_name: high_school_world_history
3976
+ test_split: test
3977
+ fewshot_split: dev
3978
+ doc_to_text: '{{question.strip()}}
3979
+
3980
+ A. {{choices[0]}}
3981
+
3982
+ B. {{choices[1]}}
3983
+
3984
+ C. {{choices[2]}}
3985
+
3986
+ D. {{choices[3]}}
3987
+
3988
+ Answer:'
3989
+ doc_to_target: answer
3990
+ doc_to_choice:
3991
+ - A
3992
+ - B
3993
+ - C
3994
+ - D
3995
+ description: 'The following are multiple choice questions (with answers)
3996
+ about high school world history.
3997
+
3998
+
3999
+ '
4000
+ target_delimiter: ' '
4001
+ fewshot_delimiter: '
4002
+
4003
+
4004
+ '
4005
+ fewshot_config:
4006
+ sampler: first_n
4007
+ metric_list:
4008
+ - metric: acc
4009
+ aggregation: mean
4010
+ higher_is_better: true
4011
+ output_type: multiple_choice
4012
+ repeats: 1
4013
+ should_decontaminate: false
4014
+ metadata:
4015
+ version: 0.0
4016
+ mmlu_human_aging:
4017
+ task: mmlu_human_aging
4018
+ task_alias: human_aging
4019
+ group: mmlu_other
4020
+ group_alias: other
4021
+ dataset_path: hails/mmlu_no_train
4022
+ dataset_name: human_aging
4023
+ test_split: test
4024
+ fewshot_split: dev
4025
+ doc_to_text: '{{question.strip()}}
4026
+
4027
+ A. {{choices[0]}}
4028
+
4029
+ B. {{choices[1]}}
4030
+
4031
+ C. {{choices[2]}}
4032
+
4033
+ D. {{choices[3]}}
4034
+
4035
+ Answer:'
4036
+ doc_to_target: answer
4037
+ doc_to_choice:
4038
+ - A
4039
+ - B
4040
+ - C
4041
+ - D
4042
+ description: 'The following are multiple choice questions (with answers)
4043
+ about human aging.
4044
+
4045
+
4046
+ '
4047
+ target_delimiter: ' '
4048
+ fewshot_delimiter: '
4049
+
4050
+
4051
+ '
4052
+ fewshot_config:
4053
+ sampler: first_n
4054
+ metric_list:
4055
+ - metric: acc
4056
+ aggregation: mean
4057
+ higher_is_better: true
4058
+ output_type: multiple_choice
4059
+ repeats: 1
4060
+ should_decontaminate: false
4061
+ metadata:
4062
+ version: 0.0
4063
+ mmlu_human_sexuality:
4064
+ task: mmlu_human_sexuality
4065
+ task_alias: human_sexuality
4066
+ group: mmlu_social_sciences
4067
+ group_alias: social_sciences
4068
+ dataset_path: hails/mmlu_no_train
4069
+ dataset_name: human_sexuality
4070
+ test_split: test
4071
+ fewshot_split: dev
4072
+ doc_to_text: '{{question.strip()}}
4073
+
4074
+ A. {{choices[0]}}
4075
+
4076
+ B. {{choices[1]}}
4077
+
4078
+ C. {{choices[2]}}
4079
+
4080
+ D. {{choices[3]}}
4081
+
4082
+ Answer:'
4083
+ doc_to_target: answer
4084
+ doc_to_choice:
4085
+ - A
4086
+ - B
4087
+ - C
4088
+ - D
4089
+ description: 'The following are multiple choice questions (with answers)
4090
+ about human sexuality.
4091
+
4092
+
4093
+ '
4094
+ target_delimiter: ' '
4095
+ fewshot_delimiter: '
4096
+
4097
+
4098
+ '
4099
+ fewshot_config:
4100
+ sampler: first_n
4101
+ metric_list:
4102
+ - metric: acc
4103
+ aggregation: mean
4104
+ higher_is_better: true
4105
+ output_type: multiple_choice
4106
+ repeats: 1
4107
+ should_decontaminate: false
4108
+ metadata:
4109
+ version: 0.0
4110
+ mmlu_international_law:
4111
+ task: mmlu_international_law
4112
+ task_alias: international_law
4113
+ group: mmlu_humanities
4114
+ group_alias: humanities
4115
+ dataset_path: hails/mmlu_no_train
4116
+ dataset_name: international_law
4117
+ test_split: test
4118
+ fewshot_split: dev
4119
+ doc_to_text: '{{question.strip()}}
4120
+
4121
+ A. {{choices[0]}}
4122
+
4123
+ B. {{choices[1]}}
4124
+
4125
+ C. {{choices[2]}}
4126
+
4127
+ D. {{choices[3]}}
4128
+
4129
+ Answer:'
4130
+ doc_to_target: answer
4131
+ doc_to_choice:
4132
+ - A
4133
+ - B
4134
+ - C
4135
+ - D
4136
+ description: 'The following are multiple choice questions (with answers)
4137
+ about international law.
4138
+
4139
+
4140
+ '
4141
+ target_delimiter: ' '
4142
+ fewshot_delimiter: '
4143
+
4144
+
4145
+ '
4146
+ fewshot_config:
4147
+ sampler: first_n
4148
+ metric_list:
4149
+ - metric: acc
4150
+ aggregation: mean
4151
+ higher_is_better: true
4152
+ output_type: multiple_choice
4153
+ repeats: 1
4154
+ should_decontaminate: false
4155
+ metadata:
4156
+ version: 0.0
4157
+ mmlu_jurisprudence:
4158
+ task: mmlu_jurisprudence
4159
+ task_alias: jurisprudence
4160
+ group: mmlu_humanities
4161
+ group_alias: humanities
4162
+ dataset_path: hails/mmlu_no_train
4163
+ dataset_name: jurisprudence
4164
+ test_split: test
4165
+ fewshot_split: dev
4166
+ doc_to_text: '{{question.strip()}}
4167
+
4168
+ A. {{choices[0]}}
4169
+
4170
+ B. {{choices[1]}}
4171
+
4172
+ C. {{choices[2]}}
4173
+
4174
+ D. {{choices[3]}}
4175
+
4176
+ Answer:'
4177
+ doc_to_target: answer
4178
+ doc_to_choice:
4179
+ - A
4180
+ - B
4181
+ - C
4182
+ - D
4183
+ description: 'The following are multiple choice questions (with answers)
4184
+ about jurisprudence.
4185
+
4186
+
4187
+ '
4188
+ target_delimiter: ' '
4189
+ fewshot_delimiter: '
4190
+
4191
+
4192
+ '
4193
+ fewshot_config:
4194
+ sampler: first_n
4195
+ metric_list:
4196
+ - metric: acc
4197
+ aggregation: mean
4198
+ higher_is_better: true
4199
+ output_type: multiple_choice
4200
+ repeats: 1
4201
+ should_decontaminate: false
4202
+ metadata:
4203
+ version: 0.0
4204
+ mmlu_logical_fallacies:
4205
+ task: mmlu_logical_fallacies
4206
+ task_alias: logical_fallacies
4207
+ group: mmlu_humanities
4208
+ group_alias: humanities
4209
+ dataset_path: hails/mmlu_no_train
4210
+ dataset_name: logical_fallacies
4211
+ test_split: test
4212
+ fewshot_split: dev
4213
+ doc_to_text: '{{question.strip()}}
4214
+
4215
+ A. {{choices[0]}}
4216
+
4217
+ B. {{choices[1]}}
4218
+
4219
+ C. {{choices[2]}}
4220
+
4221
+ D. {{choices[3]}}
4222
+
4223
+ Answer:'
4224
+ doc_to_target: answer
4225
+ doc_to_choice:
4226
+ - A
4227
+ - B
4228
+ - C
4229
+ - D
4230
+ description: 'The following are multiple choice questions (with answers)
4231
+ about logical fallacies.
4232
+
4233
+
4234
+ '
4235
+ target_delimiter: ' '
4236
+ fewshot_delimiter: '
4237
+
4238
+
4239
+ '
4240
+ fewshot_config:
4241
+ sampler: first_n
4242
+ metric_list:
4243
+ - metric: acc
4244
+ aggregation: mean
4245
+ higher_is_better: true
4246
+ output_type: multiple_choice
4247
+ repeats: 1
4248
+ should_decontaminate: false
4249
+ metadata:
4250
+ version: 0.0
4251
+ mmlu_machine_learning:
4252
+ task: mmlu_machine_learning
4253
+ task_alias: machine_learning
4254
+ group: mmlu_stem
4255
+ group_alias: stem
4256
+ dataset_path: hails/mmlu_no_train
4257
+ dataset_name: machine_learning
4258
+ test_split: test
4259
+ fewshot_split: dev
4260
+ doc_to_text: '{{question.strip()}}
4261
+
4262
+ A. {{choices[0]}}
4263
+
4264
+ B. {{choices[1]}}
4265
+
4266
+ C. {{choices[2]}}
4267
+
4268
+ D. {{choices[3]}}
4269
+
4270
+ Answer:'
4271
+ doc_to_target: answer
4272
+ doc_to_choice:
4273
+ - A
4274
+ - B
4275
+ - C
4276
+ - D
4277
+ description: 'The following are multiple choice questions (with answers)
4278
+ about machine learning.
4279
+
4280
+
4281
+ '
4282
+ target_delimiter: ' '
4283
+ fewshot_delimiter: '
4284
+
4285
+
4286
+ '
4287
+ fewshot_config:
4288
+ sampler: first_n
4289
+ metric_list:
4290
+ - metric: acc
4291
+ aggregation: mean
4292
+ higher_is_better: true
4293
+ output_type: multiple_choice
4294
+ repeats: 1
4295
+ should_decontaminate: false
4296
+ metadata:
4297
+ version: 0.0
4298
+ mmlu_management:
4299
+ task: mmlu_management
4300
+ task_alias: management
4301
+ group: mmlu_other
4302
+ group_alias: other
4303
+ dataset_path: hails/mmlu_no_train
4304
+ dataset_name: management
4305
+ test_split: test
4306
+ fewshot_split: dev
4307
+ doc_to_text: '{{question.strip()}}
4308
+
4309
+ A. {{choices[0]}}
4310
+
4311
+ B. {{choices[1]}}
4312
+
4313
+ C. {{choices[2]}}
4314
+
4315
+ D. {{choices[3]}}
4316
+
4317
+ Answer:'
4318
+ doc_to_target: answer
4319
+ doc_to_choice:
4320
+ - A
4321
+ - B
4322
+ - C
4323
+ - D
4324
+ description: 'The following are multiple choice questions (with answers)
4325
+ about management.
4326
+
4327
+
4328
+ '
4329
+ target_delimiter: ' '
4330
+ fewshot_delimiter: '
4331
+
4332
+
4333
+ '
4334
+ fewshot_config:
4335
+ sampler: first_n
4336
+ metric_list:
4337
+ - metric: acc
4338
+ aggregation: mean
4339
+ higher_is_better: true
4340
+ output_type: multiple_choice
4341
+ repeats: 1
4342
+ should_decontaminate: false
4343
+ metadata:
4344
+ version: 0.0
4345
+ mmlu_marketing:
4346
+ task: mmlu_marketing
4347
+ task_alias: marketing
4348
+ group: mmlu_other
4349
+ group_alias: other
4350
+ dataset_path: hails/mmlu_no_train
4351
+ dataset_name: marketing
4352
+ test_split: test
4353
+ fewshot_split: dev
4354
+ doc_to_text: '{{question.strip()}}
4355
+
4356
+ A. {{choices[0]}}
4357
+
4358
+ B. {{choices[1]}}
4359
+
4360
+ C. {{choices[2]}}
4361
+
4362
+ D. {{choices[3]}}
4363
+
4364
+ Answer:'
4365
+ doc_to_target: answer
4366
+ doc_to_choice:
4367
+ - A
4368
+ - B
4369
+ - C
4370
+ - D
4371
+ description: 'The following are multiple choice questions (with answers)
4372
+ about marketing.
4373
+
4374
+
4375
+ '
4376
+ target_delimiter: ' '
4377
+ fewshot_delimiter: '
4378
+
4379
+
4380
+ '
4381
+ fewshot_config:
4382
+ sampler: first_n
4383
+ metric_list:
4384
+ - metric: acc
4385
+ aggregation: mean
4386
+ higher_is_better: true
4387
+ output_type: multiple_choice
4388
+ repeats: 1
4389
+ should_decontaminate: false
4390
+ metadata:
4391
+ version: 0.0
4392
+ mmlu_medical_genetics:
4393
+ task: mmlu_medical_genetics
4394
+ task_alias: medical_genetics
4395
+ group: mmlu_other
4396
+ group_alias: other
4397
+ dataset_path: hails/mmlu_no_train
4398
+ dataset_name: medical_genetics
4399
+ test_split: test
4400
+ fewshot_split: dev
4401
+ doc_to_text: '{{question.strip()}}
4402
+
4403
+ A. {{choices[0]}}
4404
+
4405
+ B. {{choices[1]}}
4406
+
4407
+ C. {{choices[2]}}
4408
+
4409
+ D. {{choices[3]}}
4410
+
4411
+ Answer:'
4412
+ doc_to_target: answer
4413
+ doc_to_choice:
4414
+ - A
4415
+ - B
4416
+ - C
4417
+ - D
4418
+ description: 'The following are multiple choice questions (with answers)
4419
+ about medical genetics.
4420
+
4421
+
4422
+ '
4423
+ target_delimiter: ' '
4424
+ fewshot_delimiter: '
4425
+
4426
+
4427
+ '
4428
+ fewshot_config:
4429
+ sampler: first_n
4430
+ metric_list:
4431
+ - metric: acc
4432
+ aggregation: mean
4433
+ higher_is_better: true
4434
+ output_type: multiple_choice
4435
+ repeats: 1
4436
+ should_decontaminate: false
4437
+ metadata:
4438
+ version: 0.0
4439
+ mmlu_miscellaneous:
4440
+ task: mmlu_miscellaneous
4441
+ task_alias: miscellaneous
4442
+ group: mmlu_other
4443
+ group_alias: other
4444
+ dataset_path: hails/mmlu_no_train
4445
+ dataset_name: miscellaneous
4446
+ test_split: test
4447
+ fewshot_split: dev
4448
+ doc_to_text: '{{question.strip()}}
4449
+
4450
+ A. {{choices[0]}}
4451
+
4452
+ B. {{choices[1]}}
4453
+
4454
+ C. {{choices[2]}}
4455
+
4456
+ D. {{choices[3]}}
4457
+
4458
+ Answer:'
4459
+ doc_to_target: answer
4460
+ doc_to_choice:
4461
+ - A
4462
+ - B
4463
+ - C
4464
+ - D
4465
+ description: 'The following are multiple choice questions (with answers)
4466
+ about miscellaneous.
4467
+
4468
+
4469
+ '
4470
+ target_delimiter: ' '
4471
+ fewshot_delimiter: '
4472
+
4473
+
4474
+ '
4475
+ fewshot_config:
4476
+ sampler: first_n
4477
+ metric_list:
4478
+ - metric: acc
4479
+ aggregation: mean
4480
+ higher_is_better: true
4481
+ output_type: multiple_choice
4482
+ repeats: 1
4483
+ should_decontaminate: false
4484
+ metadata:
4485
+ version: 0.0
4486
+ mmlu_moral_disputes:
4487
+ task: mmlu_moral_disputes
4488
+ task_alias: moral_disputes
4489
+ group: mmlu_humanities
4490
+ group_alias: humanities
4491
+ dataset_path: hails/mmlu_no_train
4492
+ dataset_name: moral_disputes
4493
+ test_split: test
4494
+ fewshot_split: dev
4495
+ doc_to_text: '{{question.strip()}}
4496
+
4497
+ A. {{choices[0]}}
4498
+
4499
+ B. {{choices[1]}}
4500
+
4501
+ C. {{choices[2]}}
4502
+
4503
+ D. {{choices[3]}}
4504
+
4505
+ Answer:'
4506
+ doc_to_target: answer
4507
+ doc_to_choice:
4508
+ - A
4509
+ - B
4510
+ - C
4511
+ - D
4512
+ description: 'The following are multiple choice questions (with answers)
4513
+ about moral disputes.
4514
+
4515
+
4516
+ '
4517
+ target_delimiter: ' '
4518
+ fewshot_delimiter: '
4519
+
4520
+
4521
+ '
4522
+ fewshot_config:
4523
+ sampler: first_n
4524
+ metric_list:
4525
+ - metric: acc
4526
+ aggregation: mean
4527
+ higher_is_better: true
4528
+ output_type: multiple_choice
4529
+ repeats: 1
4530
+ should_decontaminate: false
4531
+ metadata:
4532
+ version: 0.0
4533
+ mmlu_moral_scenarios:
4534
+ task: mmlu_moral_scenarios
4535
+ task_alias: moral_scenarios
4536
+ group: mmlu_humanities
4537
+ group_alias: humanities
4538
+ dataset_path: hails/mmlu_no_train
4539
+ dataset_name: moral_scenarios
4540
+ test_split: test
4541
+ fewshot_split: dev
4542
+ doc_to_text: '{{question.strip()}}
4543
+
4544
+ A. {{choices[0]}}
4545
+
4546
+ B. {{choices[1]}}
4547
+
4548
+ C. {{choices[2]}}
4549
+
4550
+ D. {{choices[3]}}
4551
+
4552
+ Answer:'
4553
+ doc_to_target: answer
4554
+ doc_to_choice:
4555
+ - A
4556
+ - B
4557
+ - C
4558
+ - D
4559
+ description: 'The following are multiple choice questions (with answers)
4560
+ about moral scenarios.
4561
+
4562
+
4563
+ '
4564
+ target_delimiter: ' '
4565
+ fewshot_delimiter: '
4566
+
4567
+
4568
+ '
4569
+ fewshot_config:
4570
+ sampler: first_n
4571
+ metric_list:
4572
+ - metric: acc
4573
+ aggregation: mean
4574
+ higher_is_better: true
4575
+ output_type: multiple_choice
4576
+ repeats: 1
4577
+ should_decontaminate: false
4578
+ metadata:
4579
+ version: 0.0
4580
+ mmlu_nutrition:
4581
+ task: mmlu_nutrition
4582
+ task_alias: nutrition
4583
+ group: mmlu_other
4584
+ group_alias: other
4585
+ dataset_path: hails/mmlu_no_train
4586
+ dataset_name: nutrition
4587
+ test_split: test
4588
+ fewshot_split: dev
4589
+ doc_to_text: '{{question.strip()}}
4590
+
4591
+ A. {{choices[0]}}
4592
+
4593
+ B. {{choices[1]}}
4594
+
4595
+ C. {{choices[2]}}
4596
+
4597
+ D. {{choices[3]}}
4598
+
4599
+ Answer:'
4600
+ doc_to_target: answer
4601
+ doc_to_choice:
4602
+ - A
4603
+ - B
4604
+ - C
4605
+ - D
4606
+ description: 'The following are multiple choice questions (with answers)
4607
+ about nutrition.
4608
+
4609
+
4610
+ '
4611
+ target_delimiter: ' '
4612
+ fewshot_delimiter: '
4613
+
4614
+
4615
+ '
4616
+ fewshot_config:
4617
+ sampler: first_n
4618
+ metric_list:
4619
+ - metric: acc
4620
+ aggregation: mean
4621
+ higher_is_better: true
4622
+ output_type: multiple_choice
4623
+ repeats: 1
4624
+ should_decontaminate: false
4625
+ metadata:
4626
+ version: 0.0
4627
+ mmlu_philosophy:
4628
+ task: mmlu_philosophy
4629
+ task_alias: philosophy
4630
+ group: mmlu_humanities
4631
+ group_alias: humanities
4632
+ dataset_path: hails/mmlu_no_train
4633
+ dataset_name: philosophy
4634
+ test_split: test
4635
+ fewshot_split: dev
4636
+ doc_to_text: '{{question.strip()}}
4637
+
4638
+ A. {{choices[0]}}
4639
+
4640
+ B. {{choices[1]}}
4641
+
4642
+ C. {{choices[2]}}
4643
+
4644
+ D. {{choices[3]}}
4645
+
4646
+ Answer:'
4647
+ doc_to_target: answer
4648
+ doc_to_choice:
4649
+ - A
4650
+ - B
4651
+ - C
4652
+ - D
4653
+ description: 'The following are multiple choice questions (with answers)
4654
+ about philosophy.
4655
+
4656
+
4657
+ '
4658
+ target_delimiter: ' '
4659
+ fewshot_delimiter: '
4660
+
4661
+
4662
+ '
4663
+ fewshot_config:
4664
+ sampler: first_n
4665
+ metric_list:
4666
+ - metric: acc
4667
+ aggregation: mean
4668
+ higher_is_better: true
4669
+ output_type: multiple_choice
4670
+ repeats: 1
4671
+ should_decontaminate: false
4672
+ metadata:
4673
+ version: 0.0
4674
+ mmlu_prehistory:
4675
+ task: mmlu_prehistory
4676
+ task_alias: prehistory
4677
+ group: mmlu_humanities
4678
+ group_alias: humanities
4679
+ dataset_path: hails/mmlu_no_train
4680
+ dataset_name: prehistory
4681
+ test_split: test
4682
+ fewshot_split: dev
4683
+ doc_to_text: '{{question.strip()}}
4684
+
4685
+ A. {{choices[0]}}
4686
+
4687
+ B. {{choices[1]}}
4688
+
4689
+ C. {{choices[2]}}
4690
+
4691
+ D. {{choices[3]}}
4692
+
4693
+ Answer:'
4694
+ doc_to_target: answer
4695
+ doc_to_choice:
4696
+ - A
4697
+ - B
4698
+ - C
4699
+ - D
4700
+ description: 'The following are multiple choice questions (with answers)
4701
+ about prehistory.
4702
+
4703
+
4704
+ '
4705
+ target_delimiter: ' '
4706
+ fewshot_delimiter: '
4707
+
4708
+
4709
+ '
4710
+ fewshot_config:
4711
+ sampler: first_n
4712
+ metric_list:
4713
+ - metric: acc
4714
+ aggregation: mean
4715
+ higher_is_better: true
4716
+ output_type: multiple_choice
4717
+ repeats: 1
4718
+ should_decontaminate: false
4719
+ metadata:
4720
+ version: 0.0
4721
+ mmlu_professional_accounting:
4722
+ task: mmlu_professional_accounting
4723
+ task_alias: professional_accounting
4724
+ group: mmlu_other
4725
+ group_alias: other
4726
+ dataset_path: hails/mmlu_no_train
4727
+ dataset_name: professional_accounting
4728
+ test_split: test
4729
+ fewshot_split: dev
4730
+ doc_to_text: '{{question.strip()}}
4731
+
4732
+ A. {{choices[0]}}
4733
+
4734
+ B. {{choices[1]}}
4735
+
4736
+ C. {{choices[2]}}
4737
+
4738
+ D. {{choices[3]}}
4739
+
4740
+ Answer:'
4741
+ doc_to_target: answer
4742
+ doc_to_choice:
4743
+ - A
4744
+ - B
4745
+ - C
4746
+ - D
4747
+ description: 'The following are multiple choice questions (with answers)
4748
+ about professional accounting.
4749
+
4750
+
4751
+ '
4752
+ target_delimiter: ' '
4753
+ fewshot_delimiter: '
4754
+
4755
+
4756
+ '
4757
+ fewshot_config:
4758
+ sampler: first_n
4759
+ metric_list:
4760
+ - metric: acc
4761
+ aggregation: mean
4762
+ higher_is_better: true
4763
+ output_type: multiple_choice
4764
+ repeats: 1
4765
+ should_decontaminate: false
4766
+ metadata:
4767
+ version: 0.0
4768
+ mmlu_professional_law:
4769
+ task: mmlu_professional_law
4770
+ task_alias: professional_law
4771
+ group: mmlu_humanities
4772
+ group_alias: humanities
4773
+ dataset_path: hails/mmlu_no_train
4774
+ dataset_name: professional_law
4775
+ test_split: test
4776
+ fewshot_split: dev
4777
+ doc_to_text: '{{question.strip()}}
4778
+
4779
+ A. {{choices[0]}}
4780
+
4781
+ B. {{choices[1]}}
4782
+
4783
+ C. {{choices[2]}}
4784
+
4785
+ D. {{choices[3]}}
4786
+
4787
+ Answer:'
4788
+ doc_to_target: answer
4789
+ doc_to_choice:
4790
+ - A
4791
+ - B
4792
+ - C
4793
+ - D
4794
+ description: 'The following are multiple choice questions (with answers)
4795
+ about professional law.
4796
+
4797
+
4798
+ '
4799
+ target_delimiter: ' '
4800
+ fewshot_delimiter: '
4801
+
4802
+
4803
+ '
4804
+ fewshot_config:
4805
+ sampler: first_n
4806
+ metric_list:
4807
+ - metric: acc
4808
+ aggregation: mean
4809
+ higher_is_better: true
4810
+ output_type: multiple_choice
4811
+ repeats: 1
4812
+ should_decontaminate: false
4813
+ metadata:
4814
+ version: 0.0
4815
+ mmlu_professional_medicine:
4816
+ task: mmlu_professional_medicine
4817
+ task_alias: professional_medicine
4818
+ group: mmlu_other
4819
+ group_alias: other
4820
+ dataset_path: hails/mmlu_no_train
4821
+ dataset_name: professional_medicine
4822
+ test_split: test
4823
+ fewshot_split: dev
4824
+ doc_to_text: '{{question.strip()}}
4825
+
4826
+ A. {{choices[0]}}
4827
+
4828
+ B. {{choices[1]}}
4829
+
4830
+ C. {{choices[2]}}
4831
+
4832
+ D. {{choices[3]}}
4833
+
4834
+ Answer:'
4835
+ doc_to_target: answer
4836
+ doc_to_choice:
4837
+ - A
4838
+ - B
4839
+ - C
4840
+ - D
4841
+ description: 'The following are multiple choice questions (with answers)
4842
+ about professional medicine.
4843
+
4844
+
4845
+ '
4846
+ target_delimiter: ' '
4847
+ fewshot_delimiter: '
4848
+
4849
+
4850
+ '
4851
+ fewshot_config:
4852
+ sampler: first_n
4853
+ metric_list:
4854
+ - metric: acc
4855
+ aggregation: mean
4856
+ higher_is_better: true
4857
+ output_type: multiple_choice
4858
+ repeats: 1
4859
+ should_decontaminate: false
4860
+ metadata:
4861
+ version: 0.0
4862
+ mmlu_professional_psychology:
4863
+ task: mmlu_professional_psychology
4864
+ task_alias: professional_psychology
4865
+ group: mmlu_social_sciences
4866
+ group_alias: social_sciences
4867
+ dataset_path: hails/mmlu_no_train
4868
+ dataset_name: professional_psychology
4869
+ test_split: test
4870
+ fewshot_split: dev
4871
+ doc_to_text: '{{question.strip()}}
4872
+
4873
+ A. {{choices[0]}}
4874
+
4875
+ B. {{choices[1]}}
4876
+
4877
+ C. {{choices[2]}}
4878
+
4879
+ D. {{choices[3]}}
4880
+
4881
+ Answer:'
4882
+ doc_to_target: answer
4883
+ doc_to_choice:
4884
+ - A
4885
+ - B
4886
+ - C
4887
+ - D
4888
+ description: 'The following are multiple choice questions (with answers)
4889
+ about professional psychology.
4890
+
4891
+
4892
+ '
4893
+ target_delimiter: ' '
4894
+ fewshot_delimiter: '
4895
+
4896
+
4897
+ '
4898
+ fewshot_config:
4899
+ sampler: first_n
4900
+ metric_list:
4901
+ - metric: acc
4902
+ aggregation: mean
4903
+ higher_is_better: true
4904
+ output_type: multiple_choice
4905
+ repeats: 1
4906
+ should_decontaminate: false
4907
+ metadata:
4908
+ version: 0.0
4909
+ mmlu_public_relations:
4910
+ task: mmlu_public_relations
4911
+ task_alias: public_relations
4912
+ group: mmlu_social_sciences
4913
+ group_alias: social_sciences
4914
+ dataset_path: hails/mmlu_no_train
4915
+ dataset_name: public_relations
4916
+ test_split: test
4917
+ fewshot_split: dev
4918
+ doc_to_text: '{{question.strip()}}
4919
+
4920
+ A. {{choices[0]}}
4921
+
4922
+ B. {{choices[1]}}
4923
+
4924
+ C. {{choices[2]}}
4925
+
4926
+ D. {{choices[3]}}
4927
+
4928
+ Answer:'
4929
+ doc_to_target: answer
4930
+ doc_to_choice:
4931
+ - A
4932
+ - B
4933
+ - C
4934
+ - D
4935
+ description: 'The following are multiple choice questions (with answers)
4936
+ about public relations.
4937
+
4938
+
4939
+ '
4940
+ target_delimiter: ' '
4941
+ fewshot_delimiter: '
4942
+
4943
+
4944
+ '
4945
+ fewshot_config:
4946
+ sampler: first_n
4947
+ metric_list:
4948
+ - metric: acc
4949
+ aggregation: mean
4950
+ higher_is_better: true
4951
+ output_type: multiple_choice
4952
+ repeats: 1
4953
+ should_decontaminate: false
4954
+ metadata:
4955
+ version: 0.0
4956
+ mmlu_security_studies:
4957
+ task: mmlu_security_studies
4958
+ task_alias: security_studies
4959
+ group: mmlu_social_sciences
4960
+ group_alias: social_sciences
4961
+ dataset_path: hails/mmlu_no_train
4962
+ dataset_name: security_studies
4963
+ test_split: test
4964
+ fewshot_split: dev
4965
+ doc_to_text: '{{question.strip()}}
4966
+
4967
+ A. {{choices[0]}}
4968
+
4969
+ B. {{choices[1]}}
4970
+
4971
+ C. {{choices[2]}}
4972
+
4973
+ D. {{choices[3]}}
4974
+
4975
+ Answer:'
4976
+ doc_to_target: answer
4977
+ doc_to_choice:
4978
+ - A
4979
+ - B
4980
+ - C
4981
+ - D
4982
+ description: 'The following are multiple choice questions (with answers)
4983
+ about security studies.
4984
+
4985
+
4986
+ '
4987
+ target_delimiter: ' '
4988
+ fewshot_delimiter: '
4989
+
4990
+
4991
+ '
4992
+ fewshot_config:
4993
+ sampler: first_n
4994
+ metric_list:
4995
+ - metric: acc
4996
+ aggregation: mean
4997
+ higher_is_better: true
4998
+ output_type: multiple_choice
4999
+ repeats: 1
5000
+ should_decontaminate: false
5001
+ metadata:
5002
+ version: 0.0
5003
+ mmlu_sociology:
5004
+ task: mmlu_sociology
5005
+ task_alias: sociology
5006
+ group: mmlu_social_sciences
5007
+ group_alias: social_sciences
5008
+ dataset_path: hails/mmlu_no_train
5009
+ dataset_name: sociology
5010
+ test_split: test
5011
+ fewshot_split: dev
5012
+ doc_to_text: '{{question.strip()}}
5013
+
5014
+ A. {{choices[0]}}
5015
+
5016
+ B. {{choices[1]}}
5017
+
5018
+ C. {{choices[2]}}
5019
+
5020
+ D. {{choices[3]}}
5021
+
5022
+ Answer:'
5023
+ doc_to_target: answer
5024
+ doc_to_choice:
5025
+ - A
5026
+ - B
5027
+ - C
5028
+ - D
5029
+ description: 'The following are multiple choice questions (with answers)
5030
+ about sociology.
5031
+
5032
+
5033
+ '
5034
+ target_delimiter: ' '
5035
+ fewshot_delimiter: '
5036
+
5037
+
5038
+ '
5039
+ fewshot_config:
5040
+ sampler: first_n
5041
+ metric_list:
5042
+ - metric: acc
5043
+ aggregation: mean
5044
+ higher_is_better: true
5045
+ output_type: multiple_choice
5046
+ repeats: 1
5047
+ should_decontaminate: false
5048
+ metadata:
5049
+ version: 0.0
5050
+ mmlu_us_foreign_policy:
5051
+ task: mmlu_us_foreign_policy
5052
+ task_alias: us_foreign_policy
5053
+ group: mmlu_social_sciences
5054
+ group_alias: social_sciences
5055
+ dataset_path: hails/mmlu_no_train
5056
+ dataset_name: us_foreign_policy
5057
+ test_split: test
5058
+ fewshot_split: dev
5059
+ doc_to_text: '{{question.strip()}}
5060
+
5061
+ A. {{choices[0]}}
5062
+
5063
+ B. {{choices[1]}}
5064
+
5065
+ C. {{choices[2]}}
5066
+
5067
+ D. {{choices[3]}}
5068
+
5069
+ Answer:'
5070
+ doc_to_target: answer
5071
+ doc_to_choice:
5072
+ - A
5073
+ - B
5074
+ - C
5075
+ - D
5076
+ description: 'The following are multiple choice questions (with answers)
5077
+ about us foreign policy.
5078
+
5079
+
5080
+ '
5081
+ target_delimiter: ' '
5082
+ fewshot_delimiter: '
5083
+
5084
+
5085
+ '
5086
+ fewshot_config:
5087
+ sampler: first_n
5088
+ metric_list:
5089
+ - metric: acc
5090
+ aggregation: mean
5091
+ higher_is_better: true
5092
+ output_type: multiple_choice
5093
+ repeats: 1
5094
+ should_decontaminate: false
5095
+ metadata:
5096
+ version: 0.0
5097
+ mmlu_virology:
5098
+ task: mmlu_virology
5099
+ task_alias: virology
5100
+ group: mmlu_other
5101
+ group_alias: other
5102
+ dataset_path: hails/mmlu_no_train
5103
+ dataset_name: virology
5104
+ test_split: test
5105
+ fewshot_split: dev
5106
+ doc_to_text: '{{question.strip()}}
5107
+
5108
+ A. {{choices[0]}}
5109
+
5110
+ B. {{choices[1]}}
5111
+
5112
+ C. {{choices[2]}}
5113
+
5114
+ D. {{choices[3]}}
5115
+
5116
+ Answer:'
5117
+ doc_to_target: answer
5118
+ doc_to_choice:
5119
+ - A
5120
+ - B
5121
+ - C
5122
+ - D
5123
+ description: 'The following are multiple choice questions (with answers)
5124
+ about virology.
5125
+
5126
+
5127
+ '
5128
+ target_delimiter: ' '
5129
+ fewshot_delimiter: '
5130
+
5131
+
5132
+ '
5133
+ fewshot_config:
5134
+ sampler: first_n
5135
+ metric_list:
5136
+ - metric: acc
5137
+ aggregation: mean
5138
+ higher_is_better: true
5139
+ output_type: multiple_choice
5140
+ repeats: 1
5141
+ should_decontaminate: false
5142
+ metadata:
5143
+ version: 0.0
5144
+ mmlu_world_religions:
5145
+ task: mmlu_world_religions
5146
+ task_alias: world_religions
5147
+ group: mmlu_humanities
5148
+ group_alias: humanities
5149
+ dataset_path: hails/mmlu_no_train
5150
+ dataset_name: world_religions
5151
+ test_split: test
5152
+ fewshot_split: dev
5153
+ doc_to_text: '{{question.strip()}}
5154
+
5155
+ A. {{choices[0]}}
5156
+
5157
+ B. {{choices[1]}}
5158
+
5159
+ C. {{choices[2]}}
5160
+
5161
+ D. {{choices[3]}}
5162
+
5163
+ Answer:'
5164
+ doc_to_target: answer
5165
+ doc_to_choice:
5166
+ - A
5167
+ - B
5168
+ - C
5169
+ - D
5170
+ description: 'The following are multiple choice questions (with answers)
5171
+ about world religions.
5172
+
5173
+
5174
+ '
5175
+ target_delimiter: ' '
5176
+ fewshot_delimiter: '
5177
+
5178
+
5179
+ '
5180
+ fewshot_config:
5181
+ sampler: first_n
5182
+ metric_list:
5183
+ - metric: acc
5184
+ aggregation: mean
5185
+ higher_is_better: true
5186
+ output_type: multiple_choice
5187
+ repeats: 1
5188
+ should_decontaminate: false
5189
+ metadata:
5190
+ version: 0.0
5191
+ versions:
5192
+ mmlu_abstract_algebra: 0.0
5193
+ mmlu_anatomy: 0.0
5194
+ mmlu_astronomy: 0.0
5195
+ mmlu_business_ethics: 0.0
5196
+ mmlu_clinical_knowledge: 0.0
5197
+ mmlu_college_biology: 0.0
5198
+ mmlu_college_chemistry: 0.0
5199
+ mmlu_college_computer_science: 0.0
5200
+ mmlu_college_mathematics: 0.0
5201
+ mmlu_college_medicine: 0.0
5202
+ mmlu_college_physics: 0.0
5203
+ mmlu_computer_security: 0.0
5204
+ mmlu_conceptual_physics: 0.0
5205
+ mmlu_econometrics: 0.0
5206
+ mmlu_electrical_engineering: 0.0
5207
+ mmlu_elementary_mathematics: 0.0
5208
+ mmlu_formal_logic: 0.0
5209
+ mmlu_global_facts: 0.0
5210
+ mmlu_high_school_biology: 0.0
5211
+ mmlu_high_school_chemistry: 0.0
5212
+ mmlu_high_school_computer_science: 0.0
5213
+ mmlu_high_school_european_history: 0.0
5214
+ mmlu_high_school_geography: 0.0
5215
+ mmlu_high_school_government_and_politics: 0.0
5216
+ mmlu_high_school_macroeconomics: 0.0
5217
+ mmlu_high_school_mathematics: 0.0
5218
+ mmlu_high_school_microeconomics: 0.0
5219
+ mmlu_high_school_physics: 0.0
5220
+ mmlu_high_school_psychology: 0.0
5221
+ mmlu_high_school_statistics: 0.0
5222
+ mmlu_high_school_us_history: 0.0
5223
+ mmlu_high_school_world_history: 0.0
5224
+ mmlu_human_aging: 0.0
5225
+ mmlu_human_sexuality: 0.0
5226
+ mmlu_international_law: 0.0
5227
+ mmlu_jurisprudence: 0.0
5228
+ mmlu_logical_fallacies: 0.0
5229
+ mmlu_machine_learning: 0.0
5230
+ mmlu_management: 0.0
5231
+ mmlu_marketing: 0.0
5232
+ mmlu_medical_genetics: 0.0
5233
+ mmlu_miscellaneous: 0.0
5234
+ mmlu_moral_disputes: 0.0
5235
+ mmlu_moral_scenarios: 0.0
5236
+ mmlu_nutrition: 0.0
5237
+ mmlu_philosophy: 0.0
5238
+ mmlu_prehistory: 0.0
5239
+ mmlu_professional_accounting: 0.0
5240
+ mmlu_professional_law: 0.0
5241
+ mmlu_professional_medicine: 0.0
5242
+ mmlu_professional_psychology: 0.0
5243
+ mmlu_public_relations: 0.0
5244
+ mmlu_security_studies: 0.0
5245
+ mmlu_sociology: 0.0
5246
+ mmlu_us_foreign_policy: 0.0
5247
+ mmlu_virology: 0.0
5248
+ mmlu_world_religions: 0.0
5249
+ n-shot:
5250
+ mmlu: 0
5251
+ config:
5252
+ model: vllm
5253
+ model_args: pretrained=DataGuard/Llama-disco-pali-merged,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.8,max_model_len=2048,trust_remote_code=True
5254
+ batch_size: auto
5255
+ batch_sizes: []
5256
+ bootstrap_iters: 100000
5257
+ git_hash: cddf85d
5258
+ pretty_env_info: 'PyTorch version: 2.1.2+cu121
5259
+
5260
+ Is debug build: False
5261
+
5262
+ CUDA used to build PyTorch: 12.1
5263
+
5264
+ ROCM used to build PyTorch: N/A
5265
+
5266
+
5267
+ OS: Ubuntu 22.04.3 LTS (x86_64)
5268
+
5269
+ GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
5270
+
5271
+ Clang version: Could not collect
5272
+
5273
+ CMake version: version 3.25.0
5274
+
5275
+ Libc version: glibc-2.35
5276
+
5277
+
5278
+ Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit
5279
+ runtime)
5280
+
5281
+ Python platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
5282
+
5283
+ Is CUDA available: True
5284
+
5285
+ CUDA runtime version: 11.8.89
5286
+
5287
+ CUDA_MODULE_LOADING set to: LAZY
5288
+
5289
+ GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
5290
+
5291
+ Nvidia driver version: 550.54.15
5292
+
5293
+ cuDNN version: Could not collect
5294
+
5295
+ HIP runtime version: N/A
5296
+
5297
+ MIOpen runtime version: N/A
5298
+
5299
+ Is XNNPACK available: True
5300
+
5301
+
5302
+ CPU:
5303
+
5304
+ Architecture: x86_64
5305
+
5306
+ CPU op-mode(s): 32-bit, 64-bit
5307
+
5308
+ Address sizes: 52 bits physical, 57 bits virtual
5309
+
5310
+ Byte Order: Little Endian
5311
+
5312
+ CPU(s): 64
5313
+
5314
+ On-line CPU(s) list: 0-63
5315
+
5316
+ Vendor ID: AuthenticAMD
5317
+
5318
+ Model name: AMD EPYC 9354 32-Core Processor
5319
+
5320
+ CPU family: 25
5321
+
5322
+ Model: 17
5323
+
5324
+ Thread(s) per core: 2
5325
+
5326
+ Core(s) per socket: 32
5327
+
5328
+ Socket(s): 1
5329
+
5330
+ Stepping: 1
5331
+
5332
+ Frequency boost: enabled
5333
+
5334
+ CPU max MHz: 3799.0720
5335
+
5336
+ CPU min MHz: 1500.0000
5337
+
5338
+ BogoMIPS: 6499.74
5339
+
5340
+ Flags: fpu vme de pse tsc msr pae mce cx8 apic
5341
+ sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
5342
+ mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl
5343
+ nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3
5344
+ fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand
5345
+ lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
5346
+ osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc
5347
+ mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perfmon_v2 ibrs
5348
+ ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid
5349
+ cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd
5350
+ sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
5351
+ cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd
5352
+ amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
5353
+ decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl
5354
+ vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni
5355
+ avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm
5356
+ flush_l1d
5357
+
5358
+ Virtualization: AMD-V
5359
+
5360
+ L1d cache: 1 MiB (32 instances)
5361
+
5362
+ L1i cache: 1 MiB (32 instances)
5363
+
5364
+ L2 cache: 32 MiB (32 instances)
5365
+
5366
+ L3 cache: 256 MiB (8 instances)
5367
+
5368
+ NUMA node(s): 1
5369
+
5370
+ NUMA node0 CPU(s): 0-63
5371
+
5372
+ Vulnerability Gather data sampling: Not affected
5373
+
5374
+ Vulnerability Itlb multihit: Not affected
5375
+
5376
+ Vulnerability L1tf: Not affected
5377
+
5378
+ Vulnerability Mds: Not affected
5379
+
5380
+ Vulnerability Meltdown: Not affected
5381
+
5382
+ Vulnerability Mmio stale data: Not affected
5383
+
5384
+ Vulnerability Retbleed: Not affected
5385
+
5386
+ Vulnerability Spec rstack overflow: Mitigation; Safe RET
5387
+
5388
+ Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass
5389
+ disabled via prctl
5390
+
5391
+ Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers
5392
+ and __user pointer sanitization
5393
+
5394
+ Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS;
5395
+ IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected;
5396
+ BHI Not affected
5397
+
5398
+ Vulnerability Srbds: Not affected
5399
+
5400
+ Vulnerability Tsx async abort: Not affected
5401
+
5402
+
5403
  Versions of relevant libraries:
5404
 
5405
  [pip3] numpy==1.24.1