--- tags: - merge - mergekit - cstr/Spaetzle-v80-7b - cstr/Spaetzle-v79-7b - cstr/Spaetzle-v81-7b - cstr/Spaetzle-v71-7b base_model: - cstr/Spaetzle-v80-7b - cstr/Spaetzle-v79-7b - cstr/Spaetzle-v81-7b - cstr/Spaetzle-v71-7b license: cc-by-nc-4.0 language: - de - en --- # Spaetzle-v85-7b Spaetzle-v85-7b is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): * [cstr/Spaetzle-v84-7b](https://huggingface.co/cstr/Spaetzle-v84-7b) * [cstr/Spaetzle-v81-7b](https://huggingface.co/cstr/Spaetzle-v81-7b) * [cstr/Spaetzle-v80-7b](https://huggingface.co/cstr/Spaetzle-v80-7b) * [cstr/Spaetzle-v79-7b](https://huggingface.co/cstr/Spaetzle-v79-7b) * [cstr/Spaetzle-v71-7b](https://huggingface.co/cstr/Spaetzle-v71-7b) ## Evaluation EQ-Bench (v2_de): 65.32, Parseable: 171.0 | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |--------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[Spaetzle-v85-7b](https://huggingface.co/cstr/Spaetzle-v85-7b)| 44.35| 75.99| 67.23| 46.55| 58.53| From [Intel/low_bit_open_llm_leaderboard](https://huggingface.co/datasets/Intel/ld_results/blob/main/cstr/Spaetzle-v85-7b-int4-inc/results_2024-06-12-21-00-34.json): | Metric | Value | |--------------|---------| | ARC-c | 62.63 | | ARC-e | 85.56 | | Boolq | 87.77 | | HellaSwag | 66.66 | | Lambada | 70.35 | | MMLU | 61.61 | | Openbookqa | 37.2 | | Piqa | 82.48 | | Truthfulqa | 50.43 | | Winogrande | 78.3 | | Average | 68.3 | From [Occiglot Euro LLM Leaderboard](https://huggingface.co/spaces/occiglot/euro-llm-leaderboard) | Model | 🇪🇺 Average ⬆️ | 🇩🇪 DE | 🇬🇧 EN | 🇬🇧ARC EN | 🇬🇧TruthfulQA EN | 🇬🇧Belebele EN | 🇬🇧HellaSwag EN | 🇬🇧MMLU EN | 🇩🇪ARC DE | 🇩🇪TruthfulQA DE | 🇩🇪Belebele DE | 🇩🇪HellaSwag DE | 🇩🇪MMLU DE | |----------------------------------------------|----------------|--------|--------|-------------|------------------|----------------|----------------|------------|-------------|------------------|----------------|----------------|------------| | mistral-community/Mixtral-8x22B-v0.1 | 68.3 | 66.81 | 72.87 | 70.56 | 52.29 | 93.89 | 70.41 | 77.17 | 63.9 | 29.31 | 92.44 | 77.9 | 70.49 | | **cstr/Spaetzle-v85-7b** | 63.26 | 61.11 | 71.94 | 70.48 | 67.16 | 90.33 | 68.54 | 63.17 | 58.43 | 36.93 | 84.22 | 70.62 | 55.36 | | cstr/Spaetzle-v60-7b | 63.32 | 60.95 | 71.65 | 69.88 | 66.24 | 90.11 | 68.43 | 63.59 | 58 | 37.31 | 84.22 | 70.09 | 55.11 | | VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct| 64.49 | 60.07 | 74.71 | 74.49 | 66.19 | 91.67 | 74.55 | 66.65 | 59.37 | 29.57 | 88.56 | 66.43 | 56.44 | | seedboxai/Llama-3-KafkaLM-8B-v0.1 | 62.27 | 59.67 | 69.75 | 69.03 | 58.14 | 90.78 | 64.35 | 66.43 | 57.66 | 30.33 | 85.89 | 66.88 | 57.58 | | cstr/llama3-8b-spaetzle-v33 | 62.75 | 59.56 | 70.68 | 69.54 | 59.31 | 91.44 | 66.04 | 67.06 | 57.06 | 28.55 | 87.56 | 66.7 | 57.92 | ### AGIEval | Task |Version| Metric |Value| |Stderr| |------------------------------|------:|--------|----:|---|-----:| |agieval_aqua_rat | 0|acc |23.23|± | 2.65| | | |acc_norm|22.44|± | 2.62| |agieval_logiqa_en | 0|acc |37.33|± | 1.90| | | |acc_norm|37.94|± | 1.90| |agieval_lsat_ar | 0|acc |25.22|± | 2.87| | | |acc_norm|23.04|± | 2.78| |agieval_lsat_lr | 0|acc |49.41|± | 2.22| | | |acc_norm|50.78|± | 2.22| |agieval_lsat_rc | 0|acc |64.68|± | 2.92| | | |acc_norm|63.20|± | 2.95| |agieval_sat_en | 0|acc |77.67|± | 2.91| | | |acc_norm|78.16|± | 2.89| |agieval_sat_en_without_passage| 0|acc |46.12|± | 3.48| | | |acc_norm|45.15|± | 3.48| |agieval_sat_math | 0|acc |35.45|± | 3.23| | | |acc_norm|34.09|± | 3.20| Average: 44.35% ### GPT4All | Task |Version| Metric |Value| |Stderr| |-------------|------:|--------|----:|---|-----:| |arc_challenge| 0|acc |63.82|± | 1.40| | | |acc_norm|64.76|± | 1.40| |arc_easy | 0|acc |85.90|± | 0.71| | | |acc_norm|82.32|± | 0.78| |boolq | 1|acc |87.61|± | 0.58| |hellaswag | 0|acc |67.39|± | 0.47| | | |acc_norm|85.36|± | 0.35| |openbookqa | 0|acc |38.80|± | 2.18| | | |acc_norm|48.80|± | 2.24| |piqa | 0|acc |83.03|± | 0.88| | | |acc_norm|84.17|± | 0.85| |winogrande | 0|acc |78.93|± | 1.15| Average: 75.99% ### TruthfulQA | Task |Version|Metric|Value| |Stderr| |-------------|------:|------|----:|---|-----:| |truthfulqa_mc| 1|mc1 |50.80|± | 1.75| | | |mc2 |67.23|± | 1.49| Average: 67.23% ### Bigbench | Task |Version| Metric |Value| |Stderr| |------------------------------------------------|------:|---------------------|----:|---|-----:| |bigbench_causal_judgement | 0|multiple_choice_grade|54.74|± | 3.62| |bigbench_date_understanding | 0|multiple_choice_grade|68.29|± | 2.43| |bigbench_disambiguation_qa | 0|multiple_choice_grade|39.53|± | 3.05| |bigbench_geometric_shapes | 0|multiple_choice_grade|22.28|± | 2.20| | | |exact_str_match |12.26|± | 1.73| |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|32.80|± | 2.10| |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.00|± | 1.59| |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|59.00|± | 2.84| |bigbench_movie_recommendation | 0|multiple_choice_grade|45.60|± | 2.23| |bigbench_navigate | 0|multiple_choice_grade|51.10|± | 1.58| |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.10|± | 1.02| |bigbench_ruin_names | 0|multiple_choice_grade|52.68|± | 2.36| |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|33.57|± | 1.50| |bigbench_snarks | 0|multiple_choice_grade|71.27|± | 3.37| |bigbench_sports_understanding | 0|multiple_choice_grade|74.54|± | 1.39| |bigbench_temporal_sequences | 0|multiple_choice_grade|40.00|± | 1.55| |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|21.52|± | 1.16| |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.86|± | 0.94| |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|59.00|± | 2.84| Average: 46.55% Average score: 58.53% ## 🧩 Configuration ```yaml models: - model: cstr/Spaetzle-v84-7b # no parameters necessary for base model - model: cstr/Spaetzle-v80-7b parameters: density: 0.65 weight: 0.2 - model: cstr/Spaetzle-v79-7b parameters: density: 0.65 weight: 0.2 - model: cstr/Spaetzle-v81-7b parameters: density: 0.65 weight: 0.2 - model: cstr/Spaetzle-v71-7b parameters: density: 0.65 weight: 0.2 merge_method: dare_ties base_model: cstr/Spaetzle-v84-7b parameters: int8_mask: true dtype: bfloat16 random_seed: 0 tokenizer_source: base ``` ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "cstr/Spaetzle-v85-7b" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ```