cstr's picture
Update README.md
e1a3060 verified
|
raw
history blame
9.45 kB
metadata
tags:
  - merge
  - mergekit
  - cstr/Spaetzle-v80-7b
  - cstr/Spaetzle-v79-7b
  - cstr/Spaetzle-v81-7b
  - cstr/Spaetzle-v71-7b
base_model:
  - cstr/Spaetzle-v80-7b
  - cstr/Spaetzle-v79-7b
  - cstr/Spaetzle-v81-7b
  - cstr/Spaetzle-v71-7b
license: cc-by-nc-4.0
language:
  - de
  - en

Spaetzle-v85-7b

Spaetzle-v85-7b is a merge of the following models using LazyMergekit:

Evaluation

EQ-Bench (v2_de): 65.32, Parseable: 171.0

Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v85-7b 44.35 75.99 67.23 46.55 58.53

From Intel/low_bit_open_llm_leaderboard:

Metric Value
ARC-c 62.63
ARC-e 85.56
Boolq 87.77
HellaSwag 66.66
Lambada 70.35
MMLU 61.61
Openbookqa 37.2
Piqa 82.48
Truthfulqa 50.43
Winogrande 78.3
Average 68.3

From Occiglot Euro LLM Leaderboard

Model 🇪🇺 Average ⬆️ 🇩🇪 DE 🇬🇧 EN 🇬🇧ARC EN 🇬🇧TruthfulQA EN 🇬🇧Belebele EN 🇬🇧HellaSwag EN 🇬🇧MMLU EN 🇩🇪ARC DE 🇩🇪TruthfulQA DE 🇩🇪Belebele DE 🇩🇪HellaSwag DE 🇩🇪MMLU DE
mistral-community/Mixtral-8x22B-v0.1 68.3 66.81 72.87 70.56 52.29 93.89 70.41 77.17 63.9 29.31 92.44 77.9 70.49
cstr/Spaetzle-v85-7b 63.26 61.11 71.94 70.48 67.16 90.33 68.54 63.17 58.43 36.93 84.22 70.62 55.36
cstr/Spaetzle-v60-7b 63.32 60.95 71.65 69.88 66.24 90.11 68.43 63.59 58 37.31 84.22 70.09 55.11
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct 64.49 60.07 74.71 74.49 66.19 91.67 74.55 66.65 59.37 29.57 88.56 66.43 56.44
seedboxai/Llama-3-KafkaLM-8B-v0.1 62.27 59.67 69.75 69.03 58.14 90.78 64.35 66.43 57.66 30.33 85.89 66.88 57.58
cstr/llama3-8b-spaetzle-v33 62.75 59.56 70.68 69.54 59.31 91.44 66.04 67.06 57.06 28.55 87.56 66.7 57.92

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 22.44 ± 2.62
agieval_logiqa_en 0 acc 37.33 ± 1.90
acc_norm 37.94 ± 1.90
agieval_lsat_ar 0 acc 25.22 ± 2.87
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 49.41 ± 2.22
acc_norm 50.78 ± 2.22
agieval_lsat_rc 0 acc 64.68 ± 2.92
acc_norm 63.20 ± 2.95
agieval_sat_en 0 acc 77.67 ± 2.91
acc_norm 78.16 ± 2.89
agieval_sat_en_without_passage 0 acc 46.12 ± 3.48
acc_norm 45.15 ± 3.48
agieval_sat_math 0 acc 35.45 ± 3.23
acc_norm 34.09 ± 3.20

Average: 44.35%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 63.82 ± 1.40
acc_norm 64.76 ± 1.40
arc_easy 0 acc 85.90 ± 0.71
acc_norm 82.32 ± 0.78
boolq 1 acc 87.61 ± 0.58
hellaswag 0 acc 67.39 ± 0.47
acc_norm 85.36 ± 0.35
openbookqa 0 acc 38.80 ± 2.18
acc_norm 48.80 ± 2.24
piqa 0 acc 83.03 ± 0.88
acc_norm 84.17 ± 0.85
winogrande 0 acc 78.93 ± 1.15

Average: 75.99%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 50.80 ± 1.75
mc2 67.23 ± 1.49

Average: 67.23%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 54.74 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 68.29 ± 2.43
bigbench_disambiguation_qa 0 multiple_choice_grade 39.53 ± 3.05
bigbench_geometric_shapes 0 multiple_choice_grade 22.28 ± 2.20
exact_str_match 12.26 ± 1.73
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 32.80 ± 2.10
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.00 ± 1.59
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 59.00 ± 2.84
bigbench_movie_recommendation 0 multiple_choice_grade 45.60 ± 2.23
bigbench_navigate 0 multiple_choice_grade 51.10 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.10 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 52.68 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 33.57 ± 1.50
bigbench_snarks 0 multiple_choice_grade 71.27 ± 3.37
bigbench_sports_understanding 0 multiple_choice_grade 74.54 ± 1.39
bigbench_temporal_sequences 0 multiple_choice_grade 40.00 ± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.52 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.86 ± 0.94
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 59.00 ± 2.84

Average: 46.55%

Average score: 58.53%

🧩 Configuration

models:
  - model: cstr/Spaetzle-v84-7b
    # no parameters necessary for base model
  - model: cstr/Spaetzle-v80-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v79-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v81-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v71-7b
    parameters:
      density: 0.65
      weight: 0.2
merge_method: dare_ties
base_model: cstr/Spaetzle-v84-7b
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v85-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])