--- base_model: - cstr/llama3.1-8b-spaetzle-v85 - cstr/llama3.1-8b-spaetzle-v86 - cstr/llama3.1-8b-spaetzle-v74 tags: - merge - mergekit - lazymergekit - cstr/llama3.1-8b-spaetzle-v85 - cstr/llama3.1-8b-spaetzle-v86 - cstr/llama3.1-8b-spaetzle-v74 license: llama3 language: - en - de --- # llama3.1-8b-spaetzle-v90 llama3.1-8b-spaetzle-v90 is a progressive merge of merges. German EQ-Bench v2_de: 69.93 (171/171). English (v2): 77.88 (171/171) The merge tree involves the following models: - NousResearch/Hermes-3-Llama-3.1-8B - Undi95/Meta-Llama-3.1-8B-Claude - Dampfinchen/Llama-3.1-8B-Ultra-Instruct - VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct - akjindal53244/Llama-3.1-Storm-8B - nbeerbower/llama3.1-gutenberg-8B - Undi95/Meta-Llama-3.1-8B-Claude - DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 - nbeerbower/llama-3-wissenschaft-8B-v2 - Azure99/blossom-v5-llama3-8b - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct - princeton-nlp/Llama-3-Instruct-8B-SimPO - Locutusque/llama-3-neural-chat-v1-8b - Locutusque/Llama-3-Orca-1.0-8B - DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental - seedboxai/Llama-3-Kafka-8B-v0.2 - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct - nbeerbower/llama-3-wissenschaft-8B-v2 - mlabonne/Daredevil-8B-abliterated-dpomix There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below. ## 🧩 Configuration The final merge for this was: ```yaml models: - model: cstr/llama3.1-8b-spaetzle-v59 # no parameters necessary for base model - model: cstr/llama3.1-8b-spaetzle-v85 parameters: density: 0.65 weight: 0.3 - model: cstr/llama3.1-8b-spaetzle-v86 parameters: density: 0.65 weight: 0.3 - model: cstr/llama3.1-8b-spaetzle-v74 parameters: density: 0.65 weight: 0.3 merge_method: dare_ties base_model: cstr/llama3.1-8b-spaetzle-v59 parameters: int8_mask: true dtype: bfloat16 random_seed: 0 tokenizer_source: base ``` Among the previous steps: ```yaml models: - model: NousResearch/Hermes-3-Llama-3.1-8B merge_method: slerp base_model: cstr/llama3.1-8b-spaetzle-v74 parameters: t: - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0] dtype: float16 ``` ## 💻 Usage Use with llama3 chat template as common. Here are GGUF quants for use with llama.cpp & wrappers as e.g. ollama: [cstr/llama3.1-8b-spaetzle-v90-GGUF](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90-GGUF)