Edit model card

Spaetzle-v58-7b

This is only for experimenting with merges that involve the somewhat cumbersome Occiglot. This one here performs not too bad, with EQ Bench Score (v2_de): 61.52 and english EQ Bench Score (v2): 75.69 But it produces some unwanted tokens still and we could get better benchmark results, but so far in tradeoffs with perceived german language quality.

Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v58-7b 44.03 75.5 60.77 45.78 56.52

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 22.83 ± 2.64
acc_norm 22.83 ± 2.64
agieval_logiqa_en 0 acc 37.94 ± 1.90
acc_norm 39.78 ± 1.92
agieval_lsat_ar 0 acc 23.48 ± 2.80
acc_norm 21.74 ± 2.73
agieval_lsat_lr 0 acc 48.63 ± 2.22
acc_norm 50.78 ± 2.22
agieval_lsat_rc 0 acc 62.45 ± 2.96
acc_norm 61.71 ± 2.97
agieval_sat_en 0 acc 77.18 ± 2.93
acc_norm 75.73 ± 2.99
agieval_sat_en_without_passage 0 acc 46.12 ± 3.48
acc_norm 45.15 ± 3.48
agieval_sat_math 0 acc 37.27 ± 3.27
acc_norm 34.55 ± 3.21

Average: 44.03%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 61.86 ± 1.42
acc_norm 62.80 ± 1.41
arc_easy 0 acc 85.31 ± 0.73
acc_norm 82.58 ± 0.78
boolq 1 acc 87.80 ± 0.57
hellaswag 0 acc 66.07 ± 0.47
acc_norm 84.37 ± 0.36
openbookqa 0 acc 38.20 ± 2.18
acc_norm 49.00 ± 2.24
piqa 0 acc 82.54 ± 0.89
acc_norm 84.44 ± 0.85
winogrande 0 acc 77.51 ± 1.17

Average: 75.5%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 44.55 ± 1.74
mc2 60.77 ± 1.54

Average: 60.77%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 56.84 ± 3.60
bigbench_date_understanding 0 multiple_choice_grade 66.40 ± 2.46
bigbench_disambiguation_qa 0 multiple_choice_grade 35.27 ± 2.98
bigbench_geometric_shapes 0 multiple_choice_grade 36.21 ± 2.54
exact_str_match 18.11 ± 2.04
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 32.20 ± 2.09
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.00 ± 1.59
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.33 ± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 42.40 ± 2.21
bigbench_navigate 0 multiple_choice_grade 50.10 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.50 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 45.09 ± 2.35
bigbench_salient_translation_error_detection 0 multiple_choice_grade 36.97 ± 1.53
bigbench_snarks 0 multiple_choice_grade 71.82 ± 3.35
bigbench_sports_understanding 0 multiple_choice_grade 69.78 ± 1.46
bigbench_temporal_sequences 0 multiple_choice_grade 35.50 ± 1.51
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.52 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.83 ± 0.92
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.33 ± 2.87

Average: 45.78%

Average score: 56.52%

Elapsed time: 02:03:03

Spaetzle-v58-7b is a merge of the following models using LazyMergekit:

🧩 Configuration

models:
  - model: cstr/Spaetzle-v57-7b
    # no parameters necessary for base model
  - model: cstr/Spaetzle-v31-7b
    parameters:
      density: 0.60
      weight: 0.30
  - model: cstr/Spaetzle-v12-7b
    parameters:
      density: 0.65
      weight: 0.30
merge_method: dare_ties
base_model: cstr/Spaetzle-v57-7b
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v58-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
3
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Merge of

Collection including cstr/Spaetzle-v58-7b