Spaetzle-v31-7b

Spaetzle-v31-7b is a merge of the following models using LazyMergekit:

Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v31-7b 46.23 76.6 69.58 46.79 59.8

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.74 ± 2.85
acc_norm 27.56 ± 2.81
agieval_logiqa_en 0 acc 39.63 ± 1.92
acc_norm 40.25 ± 1.92
agieval_lsat_ar 0 acc 24.35 ± 2.84
acc_norm 24.35 ± 2.84
agieval_lsat_lr 0 acc 54.31 ± 2.21
acc_norm 54.12 ± 2.21
agieval_lsat_rc 0 acc 65.80 ± 2.90
acc_norm 66.54 ± 2.88
agieval_sat_en 0 acc 79.13 ± 2.84
acc_norm 79.61 ± 2.81
agieval_sat_en_without_passage 0 acc 46.12 ± 3.48
acc_norm 45.15 ± 3.48
agieval_sat_math 0 acc 35.00 ± 3.22
acc_norm 32.27 ± 3.16

Average: 46.23%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 64.76 ± 1.40
acc_norm 66.89 ± 1.38
arc_easy 0 acc 86.66 ± 0.70
acc_norm 82.83 ± 0.77
boolq 1 acc 87.80 ± 0.57
hellaswag 0 acc 67.43 ± 0.47
acc_norm 85.85 ± 0.35
openbookqa 0 acc 38.00 ± 2.17
acc_norm 48.80 ± 2.24
piqa 0 acc 83.57 ± 0.86
acc_norm 84.71 ± 0.84
winogrande 0 acc 79.32 ± 1.14

Average: 76.6%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 53.37 ± 1.75
mc2 69.58 ± 1.48

Average: 69.58%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 56.84 ± 3.60
bigbench_date_understanding 0 multiple_choice_grade 66.94 ± 2.45
bigbench_disambiguation_qa 0 multiple_choice_grade 44.57 ± 3.10
bigbench_geometric_shapes 0 multiple_choice_grade 21.17 ± 2.16
exact_str_match 0.28 ± 0.28
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 31.80 ± 2.08
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 22.57 ± 1.58
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.00 ± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 45.40 ± 2.23
bigbench_navigate 0 multiple_choice_grade 52.80 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.65 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 50.67 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 30.66 ± 1.46
bigbench_snarks 0 multiple_choice_grade 71.27 ± 3.37
bigbench_sports_understanding 0 multiple_choice_grade 74.34 ± 1.39
bigbench_temporal_sequences 0 multiple_choice_grade 49.80 ± 1.58
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 22.16 ± 1.18
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.57 ± 0.93
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.00 ± 2.87

Average: 46.79%

Average score: 59.8%

Elapsed time: 02:09:50

🧩 Configuration

models:
  - model: cstr/spaetzle-v8-7b
    # no parameters necessary for base model
  - model: yleo/EmertonMonarch-7B
    parameters:
      density: 0.60
      weight: 0.3
merge_method: dare_ties
base_model: cstr/spaetzle-v8-7b
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v31-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
19
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cstr/Spaetzle-v31-7b

Finetuned
(2)
this model

Collection including cstr/Spaetzle-v31-7b