--- license: apache-2.0 library_name: transformers tags: - mergekit - merge base_model: - lodrick-the-lafted/Olethros-8B - lodrick-the-lafted/Limon-8B - lodrick-the-lafted/Rummage-8B - cgato/L3-TheSpice-8b-v0.8.3 - unsloth/llama-3-8b-Instruct - Edgerunners/meta-llama-3-8b-instruct-hf-ortho-baukit-10fail-1000total model-index: - name: Kudzu-8B results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 62.46 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Kudzu-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 80.28 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Kudzu-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 68.14 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Kudzu-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 52.77 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Kudzu-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 76.8 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Kudzu-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 63.08 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Kudzu-8B name: Open LLM Leaderboard --- Kudzu-8B Fresh out of the mergekit-evolve kitchen, this is a merge model between: * [lodrick-the-lafted/Olethros-8B](https://huggingface.co/lodrick-the-lafted/Olethros-8B) * [lodrick-the-lafted/Limon-8B](https://huggingface.co/lodrick-the-lafted/Limon-8B) * [lodrick-the-lafted/Rummage-8B](https://huggingface.co/lodrick-the-lafted/Rummage-8B) * [Edgerunners/meta-llama-3-8b-instruct-hf-ortho-baukit-10fail-1000total](https://huggingface.co/Edgerunners/meta-llama-3-8b-instruct-hf-ortho-baukit-10fail-1000total) * [cgato/L3-TheSpice-8b-v0.8.3](https://huggingface.co/cgato/L3-TheSpice-8b-v0.8.3) Used wmdp as the scoring method for evolve. In my limited testing, it has not done the usual Llama-3 "Ahaha!" interjections while retaining a good portion of the intelligence. There are several ablated models in the mix so don't be surprised if it gives you what you ask for. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lodrick-the-lafted__Kudzu-8B) | Metric |Value| |---------------------------------|----:| |Avg. |67.25| |AI2 Reasoning Challenge (25-Shot)|62.46| |HellaSwag (10-Shot) |80.28| |MMLU (5-Shot) |68.14| |TruthfulQA (0-shot) |52.77| |Winogrande (5-shot) |76.80| |GSM8k (5-shot) |63.08|