Edit model card

Spaetzle-v85-7b

Spaetzle-v85-7b is a merge of the following models using LazyMergekit:

Evaluation

EQ-Bench (v2_de): 65.32, Parseable: 171.0

Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v85-7b 44.35 75.99 67.23 46.55 58.53

From Intel/low_bit_open_llm_leaderboard:

Metric Value
ARC-c 62.63
ARC-e 85.56
Boolq 87.77
HellaSwag 66.66
Lambada 70.35
MMLU 61.61
Openbookqa 37.2
Piqa 82.48
Truthfulqa 50.43
Winogrande 78.3
Average 68.3

From Occiglot Euro LLM Leaderboard

Model 🇪🇺 Average ⬆️ 🇩🇪 DE 🇬🇧 EN 🇬🇧ARC EN 🇬🇧TruthfulQA EN 🇬🇧Belebele EN 🇬🇧HellaSwag EN 🇬🇧MMLU EN 🇩🇪ARC DE 🇩🇪TruthfulQA DE 🇩🇪Belebele DE 🇩🇪HellaSwag DE 🇩🇪MMLU DE
mistral-community/Mixtral-8x22B-v0.1 68.3 66.81 72.87 70.56 52.29 93.89 70.41 77.17 63.9 29.31 92.44 77.9 70.49
cstr/Spaetzle-v85-7b 63.26 61.11 71.94 70.48 67.16 90.33 68.54 63.17 58.43 36.93 84.22 70.62 55.36
cstr/Spaetzle-v60-7b 63.32 60.95 71.65 69.88 66.24 90.11 68.43 63.59 58 37.31 84.22 70.09 55.11
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct 64.49 60.07 74.71 74.49 66.19 91.67 74.55 66.65 59.37 29.57 88.56 66.43 56.44
seedboxai/Llama-3-KafkaLM-8B-v0.1 62.27 59.67 69.75 69.03 58.14 90.78 64.35 66.43 57.66 30.33 85.89 66.88 57.58
cstr/llama3-8b-spaetzle-v33 62.75 59.56 70.68 69.54 59.31 91.44 66.04 67.06 57.06 28.55 87.56 66.7 57.92

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 22.44 ± 2.62
agieval_logiqa_en 0 acc 37.33 ± 1.90
acc_norm 37.94 ± 1.90
agieval_lsat_ar 0 acc 25.22 ± 2.87
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 49.41 ± 2.22
acc_norm 50.78 ± 2.22
agieval_lsat_rc 0 acc 64.68 ± 2.92
acc_norm 63.20 ± 2.95
agieval_sat_en 0 acc 77.67 ± 2.91
acc_norm 78.16 ± 2.89
agieval_sat_en_without_passage 0 acc 46.12 ± 3.48
acc_norm 45.15 ± 3.48
agieval_sat_math 0 acc 35.45 ± 3.23
acc_norm 34.09 ± 3.20

Average: 44.35%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 63.82 ± 1.40
acc_norm 64.76 ± 1.40
arc_easy 0 acc 85.90 ± 0.71
acc_norm 82.32 ± 0.78
boolq 1 acc 87.61 ± 0.58
hellaswag 0 acc 67.39 ± 0.47
acc_norm 85.36 ± 0.35
openbookqa 0 acc 38.80 ± 2.18
acc_norm 48.80 ± 2.24
piqa 0 acc 83.03 ± 0.88
acc_norm 84.17 ± 0.85
winogrande 0 acc 78.93 ± 1.15

Average: 75.99%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 50.80 ± 1.75
mc2 67.23 ± 1.49

Average: 67.23%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 54.74 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 68.29 ± 2.43
bigbench_disambiguation_qa 0 multiple_choice_grade 39.53 ± 3.05
bigbench_geometric_shapes 0 multiple_choice_grade 22.28 ± 2.20
exact_str_match 12.26 ± 1.73
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 32.80 ± 2.10
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.00 ± 1.59
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 59.00 ± 2.84
bigbench_movie_recommendation 0 multiple_choice_grade 45.60 ± 2.23
bigbench_navigate 0 multiple_choice_grade 51.10 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.10 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 52.68 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 33.57 ± 1.50
bigbench_snarks 0 multiple_choice_grade 71.27 ± 3.37
bigbench_sports_understanding 0 multiple_choice_grade 74.54 ± 1.39
bigbench_temporal_sequences 0 multiple_choice_grade 40.00 ± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.52 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.86 ± 0.94
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 59.00 ± 2.84

Average: 46.55%

Average score: 58.53%

🧩 Configuration

models:
  - model: cstr/Spaetzle-v84-7b
    # no parameters necessary for base model
  - model: cstr/Spaetzle-v80-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v79-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v81-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v71-7b
    parameters:
      density: 0.65
      weight: 0.2
merge_method: dare_ties
base_model: cstr/Spaetzle-v84-7b
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v85-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
13
Safetensors
Model size
7.24B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Collection including cstr/Spaetzle-v85-7b