--- language: - en license: cc-by-nc-sa-4.0 pipeline_tag: text-generation model-index: - name: Merge_Sakura_Solar results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 70.73 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 88.51 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 66.03 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 72.21 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 82.72 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 63.99 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar name: Open LLM Leaderboard --- **Explanation** - Merged three models using [mergekit](https://github.com/arcee-ai/mergekit) (dare_ties) **Models** - [Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) - [Sakura-SOLRCA-Math-Instruct-DPO-v2](https://huggingface.co/kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2) - [Sakura-SOLRCA-Instruct-DPO](https://huggingface.co/kyujinpy/Sakura-SOLRCA-Instruct-DPO) **Score** |Average|ARC|HellaSwag|MMLU|TruthfulQA|Winogrande|GSM8K| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| |74.03|70.73|88.51|66.03|72.21|82.72|63.99| **Original Author's HuggingFace profile** - [kyujinpy](https://huggingface.co/kyujinpy) **License** - Following the license written at the author's space # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dddsaty__Merge_Sakura_Solar) | Metric |Value| |---------------------------------|----:| |Avg. |74.03| |AI2 Reasoning Challenge (25-Shot)|70.73| |HellaSwag (10-Shot) |88.51| |MMLU (5-Shot) |66.03| |TruthfulQA (0-shot) |72.21| |Winogrande (5-shot) |82.72| |GSM8k (5-shot) |63.99|