hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64 | Task |Version| Metric | Value | |Stderr| |--------------|------:|--------|------:|---|-----:| |arc_easy | 0|acc | 0.4322|± |0.0102| | | |acc_norm| 0.3868|± |0.0100| |boolq | 1|acc | 0.6092|± |0.0085| |lambada_openai| 0|ppl |74.2399|± |2.9038| | | |acc | 0.2604|± |0.0061| |openbookqa | 0|acc | 0.1440|± |0.0157| | | |acc_norm| 0.2780|± |0.0201| |piqa | 0|acc | 0.5909|± |0.0115| | | |acc_norm| 0.5871|± |0.0115| |winogrande | 0|acc | 0.5225|± |0.0140| hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 25, batch_size: 64 | Task |Version| Metric |Value | |Stderr| |-------------|------:|--------|-----:|---|-----:| |arc_challenge| 0|acc |0.1817|± |0.0113| | | |acc_norm|0.2329|± |0.0124| hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 10, batch_size: 64 | Task |Version| Metric |Value | |Stderr| |---------|------:|--------|-----:|---|-----:| |hellaswag| 0|acc |0.2792|± |0.0045| | | |acc_norm|0.2865|± |0.0045| hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64 | Task |Version|Metric|Value | |Stderr| |-------------|------:|------|-----:|---|-----:| |truthfulqa_mc| 1|mc1 |0.2485|± |0.0151| | | |mc2 |0.4594|± |0.0151| hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 5, batch_size: 64 | Task |Version| Metric |Value | |Stderr| |-------------------------------------------------|------:|--------|-----:|---|-----:| |hendrycksTest-abstract_algebra | 1|acc |0.2200|± |0.0416| | | |acc_norm|0.2200|± |0.0416| |hendrycksTest-anatomy | 1|acc |0.2741|± |0.0385| | | |acc_norm|0.2741|± |0.0385| |hendrycksTest-astronomy | 1|acc |0.1776|± |0.0311| | | |acc_norm|0.1776|± |0.0311| |hendrycksTest-business_ethics | 1|acc |0.2100|± |0.0409| | | |acc_norm|0.2100|± |0.0409| |hendrycksTest-clinical_knowledge | 1|acc |0.2264|± |0.0258| | | |acc_norm|0.2264|± |0.0258| |hendrycksTest-college_biology | 1|acc |0.2500|± |0.0362| | | |acc_norm|0.2500|± |0.0362| |hendrycksTest-college_chemistry | 1|acc |0.1500|± |0.0359| | | |acc_norm|0.1500|± |0.0359| |hendrycksTest-college_computer_science | 1|acc |0.1600|± |0.0368| | | |acc_norm|0.1600|± |0.0368| |hendrycksTest-college_mathematics | 1|acc |0.3000|± |0.0461| | | |acc_norm|0.3000|± |0.0461| |hendrycksTest-college_medicine | 1|acc |0.1908|± |0.0300| | | |acc_norm|0.1908|± |0.0300| |hendrycksTest-college_physics | 1|acc |0.2157|± |0.0409| | | |acc_norm|0.2157|± |0.0409| |hendrycksTest-computer_security | 1|acc |0.2200|± |0.0416| | | |acc_norm|0.2200|± |0.0416| |hendrycksTest-conceptual_physics | 1|acc |0.2383|± |0.0279| | | |acc_norm|0.2383|± |0.0279| |hendrycksTest-econometrics | 1|acc |0.2456|± |0.0405| | | |acc_norm|0.2456|± |0.0405| |hendrycksTest-electrical_engineering | 1|acc |0.2276|± |0.0349| | | |acc_norm|0.2276|± |0.0349| |hendrycksTest-elementary_mathematics | 1|acc |0.1772|± |0.0197| | | |acc_norm|0.1772|± |0.0197| |hendrycksTest-formal_logic | 1|acc |0.2460|± |0.0385| | | |acc_norm|0.2460|± |0.0385| |hendrycksTest-global_facts | 1|acc |0.2400|± |0.0429| | | |acc_norm|0.2400|± |0.0429| |hendrycksTest-high_school_biology | 1|acc |0.3065|± |0.0262| | | |acc_norm|0.3065|± |0.0262| |hendrycksTest-high_school_chemistry | 1|acc |0.2759|± |0.0314| | | |acc_norm|0.2759|± |0.0314| |hendrycksTest-high_school_computer_science | 1|acc |0.1600|± |0.0368| | | |acc_norm|0.1600|± |0.0368| |hendrycksTest-high_school_european_history | 1|acc |0.2242|± |0.0326| | | |acc_norm|0.2242|± |0.0326| |hendrycksTest-high_school_geography | 1|acc |0.2828|± |0.0321| | | |acc_norm|0.2828|± |0.0321| |hendrycksTest-high_school_government_and_politics| 1|acc |0.3472|± |0.0344| | | |acc_norm|0.3472|± |0.0344| |hendrycksTest-high_school_macroeconomics | 1|acc |0.3026|± |0.0233| | | |acc_norm|0.3026|± |0.0233| |hendrycksTest-high_school_mathematics | 1|acc |0.2667|± |0.0270| | | |acc_norm|0.2667|± |0.0270| |hendrycksTest-high_school_microeconomics | 1|acc |0.2983|± |0.0297| | | |acc_norm|0.2983|± |0.0297| |hendrycksTest-high_school_physics | 1|acc |0.1722|± |0.0308| | | |acc_norm|0.1722|± |0.0308| |hendrycksTest-high_school_psychology | 1|acc |0.2312|± |0.0181| | | |acc_norm|0.2312|± |0.0181| |hendrycksTest-high_school_statistics | 1|acc |0.4167|± |0.0336| | | |acc_norm|0.4167|± |0.0336| |hendrycksTest-high_school_us_history | 1|acc |0.2451|± |0.0302| | | |acc_norm|0.2451|± |0.0302| |hendrycksTest-high_school_world_history | 1|acc |0.2489|± |0.0281| | | |acc_norm|0.2489|± |0.0281| |hendrycksTest-human_aging | 1|acc |0.2422|± |0.0288| | | |acc_norm|0.2422|± |0.0288| |hendrycksTest-human_sexuality | 1|acc |0.2214|± |0.0364| | | |acc_norm|0.2214|± |0.0364| |hendrycksTest-international_law | 1|acc |0.3223|± |0.0427| | | |acc_norm|0.3223|± |0.0427| |hendrycksTest-jurisprudence | 1|acc |0.2500|± |0.0419| | | |acc_norm|0.2500|± |0.0419| |hendrycksTest-logical_fallacies | 1|acc |0.2454|± |0.0338| | | |acc_norm|0.2454|± |0.0338| |hendrycksTest-machine_learning | 1|acc |0.1964|± |0.0377| | | |acc_norm|0.1964|± |0.0377| |hendrycksTest-management | 1|acc |0.2427|± |0.0425| | | |acc_norm|0.2427|± |0.0425| |hendrycksTest-marketing | 1|acc |0.2009|± |0.0262| | | |acc_norm|0.2009|± |0.0262| |hendrycksTest-medical_genetics | 1|acc |0.2400|± |0.0429| | | |acc_norm|0.2400|± |0.0429| |hendrycksTest-miscellaneous | 1|acc |0.2593|± |0.0157| | | |acc_norm|0.2593|± |0.0157| |hendrycksTest-moral_disputes | 1|acc |0.2486|± |0.0233| | | |acc_norm|0.2486|± |0.0233| |hendrycksTest-moral_scenarios | 1|acc |0.2469|± |0.0144| | | |acc_norm|0.2469|± |0.0144| |hendrycksTest-nutrition | 1|acc |0.2157|± |0.0236| | | |acc_norm|0.2157|± |0.0236| |hendrycksTest-philosophy | 1|acc |0.2830|± |0.0256| | | |acc_norm|0.2830|± |0.0256| |hendrycksTest-prehistory | 1|acc |0.2377|± |0.0237| | | |acc_norm|0.2377|± |0.0237| |hendrycksTest-professional_accounting | 1|acc |0.2801|± |0.0268| | | |acc_norm|0.2801|± |0.0268| |hendrycksTest-professional_law | 1|acc |0.2458|± |0.0110| | | |acc_norm|0.2458|± |0.0110| |hendrycksTest-professional_medicine | 1|acc |0.2794|± |0.0273| | | |acc_norm|0.2794|± |0.0273| |hendrycksTest-professional_psychology | 1|acc |0.2598|± |0.0177| | | |acc_norm|0.2598|± |0.0177| |hendrycksTest-public_relations | 1|acc |0.2273|± |0.0401| | | |acc_norm|0.2273|± |0.0401| |hendrycksTest-security_studies | 1|acc |0.3388|± |0.0303| | | |acc_norm|0.3388|± |0.0303| |hendrycksTest-sociology | 1|acc |0.2189|± |0.0292| | | |acc_norm|0.2189|± |0.0292| |hendrycksTest-us_foreign_policy | 1|acc |0.2100|± |0.0409| | | |acc_norm|0.2100|± |0.0409| |hendrycksTest-virology | 1|acc |0.2169|± |0.0321| | | |acc_norm|0.2169|± |0.0321| |hendrycksTest-world_religions | 1|acc |0.2047|± |0.0309| | | |acc_norm|0.2047|± |0.0309|