--- license: apache-2.0 tags: - merge - mergekit --- # MetaModel This model is a merge of the following models made with [mergekit](https://github.com/cg123/mergekit): * [jeonsworld/CarbonVillain-en-10.7B-v4](https://huggingface.co/jeonsworld/CarbonVillain-en-10.7B-v4) * [kekmodel/StopCarbon-10.7B-v5](https://huggingface.co/kekmodel/StopCarbon-10.7B-v5) ## 🧩 Configuration ```yaml slices: - sources: - model: jeonsworld/CarbonVillain-en-10.7B-v4 layer_range: [0, 48] - model: kekmodel/StopCarbon-10.7B-v5 layer_range: [0, 48] merge_method: slerp base_model: jeonsworld/CarbonVillain-en-10.7B-v4 parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: bfloat16 ``` # Dataset Card for Evaluation run of gagan3012/MetaModel Dataset automatically created during the evaluation run of model [gagan3012/MetaModel](https://huggingface.co/gagan3012/MetaModel) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)). To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_gagan3012__MetaModel", "harness_winogrande_5", split="train") ``` ## Latest results These are the [latest results from run 2024-01-04T14:09:43.780941](https://huggingface.co/datasets/open-llm-leaderboard/details_gagan3012__MetaModel/blob/main/results_2024-01-04T14-09-43.780941.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "acc": 0.6664380298886512, "acc_stderr": 0.031642195230944255, "acc_norm": 0.6671639222858992, "acc_norm_stderr": 0.03228745343467652, "mc1": 0.5691554467564259, "mc1_stderr": 0.01733527247533237, "mc2": 0.7184177934834866, "mc2_stderr": 0.014995634120330182 }, "harness|arc:challenge|25": { "acc": 0.6843003412969283, "acc_stderr": 0.013582571095815291, "acc_norm": 0.7107508532423208, "acc_norm_stderr": 0.01325001257939344 }, "harness|hellaswag|10": { "acc": 0.7132045409281019, "acc_stderr": 0.004513409114983828, "acc_norm": 0.8844851623182632, "acc_norm_stderr": 0.0031898897894046684 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.43, "acc_stderr": 0.049756985195624284, "acc_norm": 0.43, "acc_norm_stderr": 0.049756985195624284 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6148148148148148, "acc_stderr": 0.04203921040156279, "acc_norm": 0.6148148148148148, "acc_norm_stderr": 0.04203921040156279 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.743421052631579, "acc_stderr": 0.0355418036802569, "acc_norm": 0.743421052631579, "acc_norm_stderr": 0.0355418036802569 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.75, "acc_stderr": 0.04351941398892446, "acc_norm": 0.75, "acc_norm_stderr": 0.04351941398892446 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6830188679245283, "acc_stderr": 0.02863723563980089, "acc_norm": 0.6830188679245283, "acc_norm_stderr": 0.02863723563980089 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7638888888888888, "acc_stderr": 0.03551446610810826, "acc_norm": 0.7638888888888888, "acc_norm_stderr": 0.03551446610810826 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.47, "acc_stderr": 0.050161355804659205, "acc_norm": 0.47, "acc_norm_stderr": 0.050161355804659205 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.48, "acc_stderr": 0.05021167315686781, "acc_norm": 0.48, "acc_norm_stderr": 0.05021167315686781 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.32, "acc_stderr": 0.046882617226215034, "acc_norm": 0.32, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6647398843930635, "acc_stderr": 0.03599586301247077, "acc_norm": 0.6647398843930635, "acc_norm_stderr": 0.03599586301247077 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.38235294117647056, "acc_stderr": 0.04835503696107223, "acc_norm": 0.38235294117647056, "acc_norm_stderr": 0.04835503696107223 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.75, "acc_stderr": 0.04351941398892446, "acc_norm": 0.75, "acc_norm_stderr": 0.04351941398892446 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.625531914893617, "acc_stderr": 0.03163910665367291, "acc_norm": 0.625531914893617, "acc_norm_stderr": 0.03163910665367291 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.4824561403508772, "acc_stderr": 0.04700708033551038, "acc_norm": 0.4824561403508772, "acc_norm_stderr": 0.04700708033551038 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6413793103448275, "acc_stderr": 0.039966295748767186, "acc_norm": 0.6413793103448275, "acc_norm_stderr": 0.039966295748767186 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.5, "acc_stderr": 0.025751310131230234, "acc_norm": 0.5, "acc_norm_stderr": 0.025751310131230234 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.42857142857142855, "acc_stderr": 0.0442626668137991, "acc_norm": 0.42857142857142855, "acc_norm_stderr": 0.0442626668137991 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.35, "acc_stderr": 0.047937248544110196, "acc_norm": 0.35, "acc_norm_stderr": 0.047937248544110196 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.8129032258064516, "acc_stderr": 0.022185710092252252, "acc_norm": 0.8129032258064516, "acc_norm_stderr": 0.022185710092252252 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5073891625615764, "acc_stderr": 0.035176035403610105, "acc_norm": 0.5073891625615764, "acc_norm_stderr": 0.035176035403610105 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.72, "acc_stderr": 0.04512608598542128, "acc_norm": 0.72, "acc_norm_stderr": 0.04512608598542128 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8121212121212121, "acc_stderr": 0.03050193405942914, "acc_norm": 0.8121212121212121, "acc_norm_stderr": 0.03050193405942914 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8636363636363636, "acc_stderr": 0.024450155973189835, "acc_norm": 0.8636363636363636, "acc_norm_stderr": 0.024450155973189835 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8963730569948186, "acc_stderr": 0.021995311963644244, "acc_norm": 0.8963730569948186, "acc_norm_stderr": 0.021995311963644244 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6692307692307692, "acc_stderr": 0.02385479568097114, "acc_norm": 0.6692307692307692, "acc_norm_stderr": 0.02385479568097114 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.37037037037037035, "acc_stderr": 0.02944316932303154, "acc_norm": 0.37037037037037035, "acc_norm_stderr": 0.02944316932303154 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.7142857142857143, "acc_stderr": 0.029344572500634332, "acc_norm": 0.7142857142857143, "acc_norm_stderr": 0.029344572500634332 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.3708609271523179, "acc_stderr": 0.03943966699183629, "acc_norm": 0.3708609271523179, "acc_norm_stderr": 0.03943966699183629 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.8422018348623853, "acc_stderr": 0.01563002297009246, "acc_norm": 0.8422018348623853, "acc_norm_stderr": 0.01563002297009246 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.5740740740740741, "acc_stderr": 0.03372343271653062, "acc_norm": 0.5740740740740741, "acc_norm_stderr": 0.03372343271653062 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.8578431372549019, "acc_stderr": 0.02450980392156862, "acc_norm": 0.8578431372549019, "acc_norm_stderr": 0.02450980392156862 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.8565400843881856, "acc_stderr": 0.022818291821017012, "acc_norm": 0.8565400843881856, "acc_norm_stderr": 0.022818291821017012 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.672645739910314, "acc_stderr": 0.03149384670994131, "acc_norm": 0.672645739910314, "acc_norm_stderr": 0.03149384670994131 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.7557251908396947, "acc_stderr": 0.03768335959728743, "acc_norm": 0.7557251908396947, "acc_norm_stderr": 0.03768335959728743 }, "harness|hendrycksTest-international_law|5": { "acc": 0.7851239669421488, "acc_stderr": 0.037494924487096966, "acc_norm": 0.7851239669421488, "acc_norm_stderr": 0.037494924487096966 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.8055555555555556, "acc_stderr": 0.038260763248848646, "acc_norm": 0.8055555555555556, "acc_norm_stderr": 0.038260763248848646 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.754601226993865, "acc_stderr": 0.03380939813943354, "acc_norm": 0.754601226993865, "acc_norm_stderr": 0.03380939813943354 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.4732142857142857, "acc_stderr": 0.047389751192741546, "acc_norm": 0.4732142857142857, "acc_norm_stderr": 0.047389751192741546 }, "harness|hendrycksTest-management|5": { "acc": 0.8446601941747572, "acc_stderr": 0.035865947385739734, "acc_norm": 0.8446601941747572, "acc_norm_stderr": 0.035865947385739734 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8589743589743589, "acc_stderr": 0.02280138253459753, "acc_norm": 0.8589743589743589, "acc_norm_stderr": 0.02280138253459753 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.7, "acc_stderr": 0.046056618647183814, "acc_norm": 0.7, "acc_norm_stderr": 0.046056618647183814 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.8084291187739464, "acc_stderr": 0.014072859310451949, "acc_norm": 0.8084291187739464, "acc_norm_stderr": 0.014072859310451949 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.7572254335260116, "acc_stderr": 0.023083658586984204, "acc_norm": 0.7572254335260116, "acc_norm_stderr": 0.023083658586984204 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.39664804469273746, "acc_stderr": 0.016361354769822468, "acc_norm": 0.39664804469273746, "acc_norm_stderr": 0.016361354769822468 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.7581699346405228, "acc_stderr": 0.024518195641879334, "acc_norm": 0.7581699346405228, "acc_norm_stderr": 0.024518195641879334 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.7202572347266881, "acc_stderr": 0.025494259350694905, "acc_norm": 0.7202572347266881, "acc_norm_stderr": 0.025494259350694905 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.7777777777777778, "acc_stderr": 0.02313237623454333, "acc_norm": 0.7777777777777778, "acc_norm_stderr": 0.02313237623454333 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.5035460992907801, "acc_stderr": 0.02982674915328092, "acc_norm": 0.5035460992907801, "acc_norm_stderr": 0.02982674915328092 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.49478487614080835, "acc_stderr": 0.012769541449652547, "acc_norm": 0.49478487614080835, "acc_norm_stderr": 0.012769541449652547 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.75, "acc_stderr": 0.026303648393696036, "acc_norm": 0.75, "acc_norm_stderr": 0.026303648393696036 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6813725490196079, "acc_stderr": 0.018850084696468712, "acc_norm": 0.6813725490196079, "acc_norm_stderr": 0.018850084696468712 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.6818181818181818, "acc_stderr": 0.04461272175910509, "acc_norm": 0.6818181818181818, "acc_norm_stderr": 0.04461272175910509 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.746938775510204, "acc_stderr": 0.027833023871399677, "acc_norm": 0.746938775510204, "acc_norm_stderr": 0.027833023871399677 }, "harness|hendrycksTest-sociology|5": { "acc": 0.8258706467661692, "acc_stderr": 0.026814951200421603, "acc_norm": 0.8258706467661692, "acc_norm_stderr": 0.026814951200421603 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.91, "acc_stderr": 0.028762349126466125, "acc_norm": 0.91, "acc_norm_stderr": 0.028762349126466125 }, "harness|hendrycksTest-virology|5": { "acc": 0.5783132530120482, "acc_stderr": 0.038444531817709175, "acc_norm": 0.5783132530120482, "acc_norm_stderr": 0.038444531817709175 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.7777777777777778, "acc_stderr": 0.03188578017686398, "acc_norm": 0.7777777777777778, "acc_norm_stderr": 0.03188578017686398 }, "harness|truthfulqa:mc|0": { "mc1": 0.5691554467564259, "mc1_stderr": 0.01733527247533237, "mc2": 0.7184177934834866, "mc2_stderr": 0.014995634120330182 }, "harness|winogrande|5": { "acc": 0.8342541436464088, "acc_stderr": 0.010450899545370632 }, "harness|gsm8k|5": { "acc": 0.6535253980288097, "acc_stderr": 0.013107179054313398 } } ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_gagan3012__MetaModel) | Metric | Value | |-----------------------|---------------------------| | Avg. | 74.4 | | ARC (25-shot) | 71.08 | | HellaSwag (10-shot) | 88.45 | | MMLU (5-shot) | 66.26 | | TruthfulQA (0-shot) | 71.84 | | Winogrande (5-shot) | 83.43 | | GSM8K (5-shot) | 65.35 |