Edit model card
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.1775 ± 0.0112
none 25 acc_norm 0.2167 ± 0.0120
truthfulqa_mc2 2 none 0 acc 0.4689 ± 0.0156
winogrande 1 none 5 acc 0.5122 ± 0.014
hellaswag 1 none 10 acc 0.2697 ± 0.0044
none 10 acc_norm 0.2827 ± 0.0045
Tasks Version Filter n-shot Metric Value Stderr
abstract_algebra 0 none 5 acc 0.2200 ± 0.0416
anatomy 0 none 5 acc 0.3333 ± 0.0407
astronomy 0 none 5 acc 0.1776 ± 0.0311
business_ethics 0 none 5 acc 0.2100 ± 0.0409
clinical_knowledge 0 none 5 acc 0.2528 ± 0.0267
college_biology 0 none 5 acc 0.2778 ± 0.0375
college_chemistry 0 none 5 acc 0.1800 ± 0.0386
college_computer_science 0 none 5 acc 0.1900 ± 0.0394
college_mathematics 0 none 5 acc 0.2200 ± 0.0416
college_medicine 0 none 5 acc 0.1965 ± 0.0303
college_physics 0 none 5 acc 0.2451 ± 0.0428
computer_security 0 none 5 acc 0.2000 ± 0.0402
conceptual_physics 0 none 5 acc 0.2511 ± 0.0283
econometrics 0 none 5 acc 0.2719 ± 0.0419
electrical_engineering 0 none 5 acc 0.2138 ± 0.0342
elementary_mathematics 0 none 5 acc 0.2460 ± 0.0222
formal_logic 0 none 5 acc 0.1905 ± 0.0351
global_facts 0 none 5 acc 0.1500 ± 0.0359
high_school_biology 0 none 5 acc 0.3000 ± 0.0261
high_school_chemistry 0 none 5 acc 0.2562 ± 0.0307
high_school_computer_science 0 none 5 acc 0.2800 ± 0.0451
high_school_european_history 0 none 5 acc 0.2788 ± 0.0350
high_school_geography 0 none 5 acc 0.3232 ± 0.0333
high_school_government_and_politics 0 none 5 acc 0.3212 ± 0.0337
high_school_macroeconomics 0 none 5 acc 0.3308 ± 0.0239
high_school_mathematics 0 none 5 acc 0.2593 ± 0.0267
high_school_microeconomics 0 none 5 acc 0.2815 ± 0.0292
high_school_physics 0 none 5 acc 0.2384 ± 0.0348
high_school_psychology 0 none 5 acc 0.2716 ± 0.0191
high_school_statistics 0 none 5 acc 0.4769 ± 0.0341
high_school_us_history 0 none 5 acc 0.2598 ± 0.0308
high_school_world_history 0 none 5 acc 0.2194 ± 0.0269
human_aging 0 none 5 acc 0.2197 ± 0.0278
human_sexuality 0 none 5 acc 0.2748 ± 0.0392
international_law 0 none 5 acc 0.3306 ± 0.0429
jurisprudence 0 none 5 acc 0.2130 ± 0.0396
logical_fallacies 0 none 5 acc 0.2331 ± 0.0332
machine_learning 0 none 5 acc 0.2232 ± 0.0395
management 0 none 5 acc 0.2039 ± 0.0399
marketing 0 none 5 acc 0.1966 ± 0.0260
medical_genetics 0 none 5 acc 0.3000 ± 0.0461
miscellaneous 0 none 5 acc 0.2580 ± 0.0156
moral_disputes 0 none 5 acc 0.1850 ± 0.0209
moral_scenarios 0 none 5 acc 0.2380 ± 0.0142
nutrition 0 none 5 acc 0.3039 ± 0.0263
philosophy 0 none 5 acc 0.1929 ± 0.0224
prehistory 0 none 5 acc 0.2160 ± 0.0229
professional_accounting 0 none 5 acc 0.2518 ± 0.0259
professional_law 0 none 5 acc 0.2419 ± 0.0109
professional_medicine 0 none 5 acc 0.4375 ± 0.0301
professional_psychology 0 none 5 acc 0.2190 ± 0.0167
public_relations 0 none 5 acc 0.2273 ± 0.0401
security_studies 0 none 5 acc 0.3633 ± 0.0308
sociology 0 none 5 acc 0.2338 ± 0.0299
us_foreign_policy 0 none 5 acc 0.2900 ± 0.0456
virology 0 none 5 acc 0.2169 ± 0.0321
world_religions 0 none 5 acc 0.1930 ± 0.0303
Downloads last month
2
Safetensors
Model size
369M params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train crumb/Llama-p-small