ParaLlama-p-medium / README.md
crumb's picture
Update README.md
02bc5c1 verified
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.1809 ± 0.0112
none 25 acc_norm 0.2201 ± 0.0121
truthfulqa_mc2 2 none 0 acc 0.4543 ± 0.0154
winogrande 1 none 5 acc 0.5154 ± 0.014
hellaswag 1 none 10 acc 0.2822 ± 0.0045
none 10 acc_norm 0.3009 ± 0.0046

0.26024912280701756

Tasks Version Filter n-shot Metric Value Stderr
abstract_algebra 0 none 5 acc 0.3100 ± 0.0465
anatomy 0 none 5 acc 0.2667 ± 0.0382
astronomy 0 none 5 acc 0.1776 ± 0.0311
business_ethics 0 none 5 acc 0.2200 ± 0.0416
clinical_knowledge 0 none 5 acc 0.2528 ± 0.0267
college_biology 0 none 5 acc 0.2153 ± 0.0344
college_chemistry 0 none 5 acc 0.2300 ± 0.0423
college_computer_science 0 none 5 acc 0.3400 ± 0.0476
college_mathematics 0 none 5 acc 0.3200 ± 0.0469
college_medicine 0 none 5 acc 0.2370 ± 0.0324
college_physics 0 none 5 acc 0.1961 ± 0.0395
computer_security 0 none 5 acc 0.2700 ± 0.0446
conceptual_physics 0 none 5 acc 0.2383 ± 0.0279
econometrics 0 none 5 acc 0.2982 ± 0.0430
electrical_engineering 0 none 5 acc 0.2552 ± 0.0363
elementary_mathematics 0 none 5 acc 0.2513 ± 0.0223
formal_logic 0 none 5 acc 0.1667 ± 0.0333
global_facts 0 none 5 acc 0.1600 ± 0.0368
high_school_biology 0 none 5 acc 0.3000 ± 0.0261
high_school_chemistry 0 none 5 acc 0.2167 ± 0.0290
high_school_computer_science 0 none 5 acc 0.2300 ± 0.0423
high_school_european_history 0 none 5 acc 0.2242 ± 0.0326
high_school_geography 0 none 5 acc 0.3283 ± 0.0335
high_school_government_and_politics 0 none 5 acc 0.3627 ± 0.0347
high_school_macroeconomics 0 none 5 acc 0.3513 ± 0.0242
high_school_mathematics 0 none 5 acc 0.2630 ± 0.0268
high_school_microeconomics 0 none 5 acc 0.3067 ± 0.0300
high_school_physics 0 none 5 acc 0.2583 ± 0.0357
high_school_psychology 0 none 5 acc 0.3174 ± 0.0200
high_school_statistics 0 none 5 acc 0.4722 ± 0.0340
high_school_us_history 0 none 5 acc 0.2353 ± 0.0298
high_school_world_history 0 none 5 acc 0.2616 ± 0.0286
human_aging 0 none 5 acc 0.2108 ± 0.0274
human_sexuality 0 none 5 acc 0.2977 ± 0.0401
international_law 0 none 5 acc 0.2645 ± 0.0403
jurisprudence 0 none 5 acc 0.2130 ± 0.0396
logical_fallacies 0 none 5 acc 0.2331 ± 0.0332
machine_learning 0 none 5 acc 0.2857 ± 0.0429
management 0 none 5 acc 0.1748 ± 0.0376
marketing 0 none 5 acc 0.1838 ± 0.0254
medical_genetics 0 none 5 acc 0.3000 ± 0.0461
miscellaneous 0 none 5 acc 0.2720 ± 0.0159
moral_disputes 0 none 5 acc 0.2457 ± 0.0232
moral_scenarios 0 none 5 acc 0.2391 ± 0.0143
nutrition 0 none 5 acc 0.2255 ± 0.0239
philosophy 0 none 5 acc 0.1961 ± 0.0226
prehistory 0 none 5 acc 0.2284 ± 0.0234
professional_accounting 0 none 5 acc 0.2553 ± 0.0260
professional_law 0 none 5 acc 0.2458 ± 0.0110
professional_medicine 0 none 5 acc 0.4485 ± 0.0302
professional_psychology 0 none 5 acc 0.2516 ± 0.0176
public_relations 0 none 5 acc 0.2727 ± 0.0427
security_studies 0 none 5 acc 0.3551 ± 0.0306
sociology 0 none 5 acc 0.2587 ± 0.0310
us_foreign_policy 0 none 5 acc 0.2100 ± 0.0409
virology 0 none 5 acc 0.2229 ± 0.0324
world_religions 0 none 5 acc 0.2105 ± 0.0313