Text Generation
Transformers
Safetensors
English
llama
smol_llama
llama2
Inference Endpoints
text-generation-inference
pszemraj's picture
Upload 101m-gqa.md
cac68b3

hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64

Task Version Metric Value Stderr
arc_easy 0 acc 0.4322 ± 0.0102
acc_norm 0.3868 ± 0.0100
boolq 1 acc 0.6092 ± 0.0085
lambada_openai 0 ppl 74.2399 ± 2.9038
acc 0.2604 ± 0.0061
openbookqa 0 acc 0.1440 ± 0.0157
acc_norm 0.2780 ± 0.0201
piqa 0 acc 0.5909 ± 0.0115
acc_norm 0.5871 ± 0.0115
winogrande 0 acc 0.5225 ± 0.0140

hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 25, batch_size: 64

Task Version Metric Value Stderr
arc_challenge 0 acc 0.1817 ± 0.0113
acc_norm 0.2329 ± 0.0124

hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 10, batch_size: 64

Task Version Metric Value Stderr
hellaswag 0 acc 0.2792 ± 0.0045
acc_norm 0.2865 ± 0.0045

hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 0.2485 ± 0.0151
mc2 0.4594 ± 0.0151

hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 5, batch_size: 64

Task Version Metric Value Stderr
hendrycksTest-abstract_algebra 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-anatomy 1 acc 0.2741 ± 0.0385
acc_norm 0.2741 ± 0.0385
hendrycksTest-astronomy 1 acc 0.1776 ± 0.0311
acc_norm 0.1776 ± 0.0311
hendrycksTest-business_ethics 1 acc 0.2100 ± 0.0409
acc_norm 0.2100 ± 0.0409
hendrycksTest-clinical_knowledge 1 acc 0.2264 ± 0.0258
acc_norm 0.2264 ± 0.0258
hendrycksTest-college_biology 1 acc 0.2500 ± 0.0362
acc_norm 0.2500 ± 0.0362
hendrycksTest-college_chemistry 1 acc 0.1500 ± 0.0359
acc_norm 0.1500 ± 0.0359
hendrycksTest-college_computer_science 1 acc 0.1600 ± 0.0368
acc_norm 0.1600 ± 0.0368
hendrycksTest-college_mathematics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-college_medicine 1 acc 0.1908 ± 0.0300
acc_norm 0.1908 ± 0.0300
hendrycksTest-college_physics 1 acc 0.2157 ± 0.0409
acc_norm 0.2157 ± 0.0409
hendrycksTest-computer_security 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-conceptual_physics 1 acc 0.2383 ± 0.0279
acc_norm 0.2383 ± 0.0279
hendrycksTest-econometrics 1 acc 0.2456 ± 0.0405
acc_norm 0.2456 ± 0.0405
hendrycksTest-electrical_engineering 1 acc 0.2276 ± 0.0349
acc_norm 0.2276 ± 0.0349
hendrycksTest-elementary_mathematics 1 acc 0.1772 ± 0.0197
acc_norm 0.1772 ± 0.0197
hendrycksTest-formal_logic 1 acc 0.2460 ± 0.0385
acc_norm 0.2460 ± 0.0385
hendrycksTest-global_facts 1 acc 0.2400 ± 0.0429
acc_norm 0.2400 ± 0.0429
hendrycksTest-high_school_biology 1 acc 0.3065 ± 0.0262
acc_norm 0.3065 ± 0.0262
hendrycksTest-high_school_chemistry 1 acc 0.2759 ± 0.0314
acc_norm 0.2759 ± 0.0314
hendrycksTest-high_school_computer_science 1 acc 0.1600 ± 0.0368
acc_norm 0.1600 ± 0.0368
hendrycksTest-high_school_european_history 1 acc 0.2242 ± 0.0326
acc_norm 0.2242 ± 0.0326
hendrycksTest-high_school_geography 1 acc 0.2828 ± 0.0321
acc_norm 0.2828 ± 0.0321
hendrycksTest-high_school_government_and_politics 1 acc 0.3472 ± 0.0344
acc_norm 0.3472 ± 0.0344
hendrycksTest-high_school_macroeconomics 1 acc 0.3026 ± 0.0233
acc_norm 0.3026 ± 0.0233
hendrycksTest-high_school_mathematics 1 acc 0.2667 ± 0.0270
acc_norm 0.2667 ± 0.0270
hendrycksTest-high_school_microeconomics 1 acc 0.2983 ± 0.0297
acc_norm 0.2983 ± 0.0297
hendrycksTest-high_school_physics 1 acc 0.1722 ± 0.0308
acc_norm 0.1722 ± 0.0308
hendrycksTest-high_school_psychology 1 acc 0.2312 ± 0.0181
acc_norm 0.2312 ± 0.0181
hendrycksTest-high_school_statistics 1 acc 0.4167 ± 0.0336
acc_norm 0.4167 ± 0.0336
hendrycksTest-high_school_us_history 1 acc 0.2451 ± 0.0302
acc_norm 0.2451 ± 0.0302
hendrycksTest-high_school_world_history 1 acc 0.2489 ± 0.0281
acc_norm 0.2489 ± 0.0281
hendrycksTest-human_aging 1 acc 0.2422 ± 0.0288
acc_norm 0.2422 ± 0.0288
hendrycksTest-human_sexuality 1 acc 0.2214 ± 0.0364
acc_norm 0.2214 ± 0.0364
hendrycksTest-international_law 1 acc 0.3223 ± 0.0427
acc_norm 0.3223 ± 0.0427
hendrycksTest-jurisprudence 1 acc 0.2500 ± 0.0419
acc_norm 0.2500 ± 0.0419
hendrycksTest-logical_fallacies 1 acc 0.2454 ± 0.0338
acc_norm 0.2454 ± 0.0338
hendrycksTest-machine_learning 1 acc 0.1964 ± 0.0377
acc_norm 0.1964 ± 0.0377
hendrycksTest-management 1 acc 0.2427 ± 0.0425
acc_norm 0.2427 ± 0.0425
hendrycksTest-marketing 1 acc 0.2009 ± 0.0262
acc_norm 0.2009 ± 0.0262
hendrycksTest-medical_genetics 1 acc 0.2400 ± 0.0429
acc_norm 0.2400 ± 0.0429
hendrycksTest-miscellaneous 1 acc 0.2593 ± 0.0157
acc_norm 0.2593 ± 0.0157
hendrycksTest-moral_disputes 1 acc 0.2486 ± 0.0233
acc_norm 0.2486 ± 0.0233
hendrycksTest-moral_scenarios 1 acc 0.2469 ± 0.0144
acc_norm 0.2469 ± 0.0144
hendrycksTest-nutrition 1 acc 0.2157 ± 0.0236
acc_norm 0.2157 ± 0.0236
hendrycksTest-philosophy 1 acc 0.2830 ± 0.0256
acc_norm 0.2830 ± 0.0256
hendrycksTest-prehistory 1 acc 0.2377 ± 0.0237
acc_norm 0.2377 ± 0.0237
hendrycksTest-professional_accounting 1 acc 0.2801 ± 0.0268
acc_norm 0.2801 ± 0.0268
hendrycksTest-professional_law 1 acc 0.2458 ± 0.0110
acc_norm 0.2458 ± 0.0110
hendrycksTest-professional_medicine 1 acc 0.2794 ± 0.0273
acc_norm 0.2794 ± 0.0273
hendrycksTest-professional_psychology 1 acc 0.2598 ± 0.0177
acc_norm 0.2598 ± 0.0177
hendrycksTest-public_relations 1 acc 0.2273 ± 0.0401
acc_norm 0.2273 ± 0.0401
hendrycksTest-security_studies 1 acc 0.3388 ± 0.0303
acc_norm 0.3388 ± 0.0303
hendrycksTest-sociology 1 acc 0.2189 ± 0.0292
acc_norm 0.2189 ± 0.0292
hendrycksTest-us_foreign_policy 1 acc 0.2100 ± 0.0409
acc_norm 0.2100 ± 0.0409
hendrycksTest-virology 1 acc 0.2169 ± 0.0321
acc_norm 0.2169 ± 0.0321
hendrycksTest-world_religions 1 acc 0.2047 ± 0.0309
acc_norm 0.2047 ± 0.0309