|
--- |
|
tags: |
|
- merge |
|
- mergekit |
|
- cstr/Spaetzle-v80-7b |
|
- cstr/Spaetzle-v79-7b |
|
- cstr/Spaetzle-v81-7b |
|
- cstr/Spaetzle-v71-7b |
|
base_model: |
|
- cstr/Spaetzle-v80-7b |
|
- cstr/Spaetzle-v79-7b |
|
- cstr/Spaetzle-v81-7b |
|
- cstr/Spaetzle-v71-7b |
|
license: cc-by-nc-4.0 |
|
language: |
|
- de |
|
- en |
|
--- |
|
|
|
# Spaetzle-v85-7b |
|
|
|
Spaetzle-v85-7b is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): |
|
* [cstr/Spaetzle-v84-7b](https://huggingface.co/cstr/Spaetzle-v84-7b) |
|
* [cstr/Spaetzle-v81-7b](https://huggingface.co/cstr/Spaetzle-v81-7b) |
|
* [cstr/Spaetzle-v80-7b](https://huggingface.co/cstr/Spaetzle-v80-7b) |
|
* [cstr/Spaetzle-v79-7b](https://huggingface.co/cstr/Spaetzle-v79-7b) |
|
* [cstr/Spaetzle-v71-7b](https://huggingface.co/cstr/Spaetzle-v71-7b) |
|
|
|
## Evaluation |
|
|
|
EQ-Bench (v2_de): 65.32, Parseable: 171.0 |
|
|
|
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |
|
|--------------------------------------------------------------|------:|------:|---------:|-------:|------:| |
|
|[Spaetzle-v85-7b](https://huggingface.co/cstr/Spaetzle-v85-7b)| 44.35| 75.99| 67.23| 46.55| 58.53| |
|
|
|
|
|
From [Intel/low_bit_open_llm_leaderboard](https://huggingface.co/datasets/Intel/ld_results/blob/main/cstr/Spaetzle-v85-7b-int4-inc/results_2024-06-12-21-00-34.json): |
|
|
|
| Metric | Value | |
|
|--------------|---------| |
|
| ARC-c | 62.63 | |
|
| ARC-e | 85.56 | |
|
| Boolq | 87.77 | |
|
| HellaSwag | 66.66 | |
|
| Lambada | 70.35 | |
|
| MMLU | 61.61 | |
|
| Openbookqa | 37.2 | |
|
| Piqa | 82.48 | |
|
| Truthfulqa | 50.43 | |
|
| Winogrande | 78.3 | |
|
| Average | 68.3 | |
|
|
|
From [Occiglot Euro LLM Leaderboard](https://huggingface.co/spaces/occiglot/euro-llm-leaderboard) |
|
| Model | 🇪🇺 Average ⬆️ | 🇩🇪 DE | 🇬🇧 EN | 🇬🇧ARC EN | 🇬🇧TruthfulQA EN | 🇬🇧Belebele EN | 🇬🇧HellaSwag EN | 🇬🇧MMLU EN | 🇩🇪ARC DE | 🇩🇪TruthfulQA DE | 🇩🇪Belebele DE | 🇩🇪HellaSwag DE | 🇩🇪MMLU DE | |
|
|----------------------------------------------|----------------|--------|--------|-------------|------------------|----------------|----------------|------------|-------------|------------------|----------------|----------------|------------| |
|
| mistral-community/Mixtral-8x22B-v0.1 | 68.3 | 66.81 | 72.87 | 70.56 | 52.29 | 93.89 | 70.41 | 77.17 | 63.9 | 29.31 | 92.44 | 77.9 | 70.49 | |
|
| **cstr/Spaetzle-v85-7b** | 63.26 | 61.11 | 71.94 | 70.48 | 67.16 | 90.33 | 68.54 | 63.17 | 58.43 | 36.93 | 84.22 | 70.62 | 55.36 | |
|
| cstr/Spaetzle-v60-7b | 63.32 | 60.95 | 71.65 | 69.88 | 66.24 | 90.11 | 68.43 | 63.59 | 58 | 37.31 | 84.22 | 70.09 | 55.11 | |
|
| VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct| 64.49 | 60.07 | 74.71 | 74.49 | 66.19 | 91.67 | 74.55 | 66.65 | 59.37 | 29.57 | 88.56 | 66.43 | 56.44 | |
|
| seedboxai/Llama-3-KafkaLM-8B-v0.1 | 62.27 | 59.67 | 69.75 | 69.03 | 58.14 | 90.78 | 64.35 | 66.43 | 57.66 | 30.33 | 85.89 | 66.88 | 57.58 | |
|
| cstr/llama3-8b-spaetzle-v33 | 62.75 | 59.56 | 70.68 | 69.54 | 59.31 | 91.44 | 66.04 | 67.06 | 57.06 | 28.55 | 87.56 | 66.7 | 57.92 | |
|
|
|
### AGIEval |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------|------:|--------|----:|---|-----:| |
|
|agieval_aqua_rat | 0|acc |23.23|± | 2.65| |
|
| | |acc_norm|22.44|± | 2.62| |
|
|agieval_logiqa_en | 0|acc |37.33|± | 1.90| |
|
| | |acc_norm|37.94|± | 1.90| |
|
|agieval_lsat_ar | 0|acc |25.22|± | 2.87| |
|
| | |acc_norm|23.04|± | 2.78| |
|
|agieval_lsat_lr | 0|acc |49.41|± | 2.22| |
|
| | |acc_norm|50.78|± | 2.22| |
|
|agieval_lsat_rc | 0|acc |64.68|± | 2.92| |
|
| | |acc_norm|63.20|± | 2.95| |
|
|agieval_sat_en | 0|acc |77.67|± | 2.91| |
|
| | |acc_norm|78.16|± | 2.89| |
|
|agieval_sat_en_without_passage| 0|acc |46.12|± | 3.48| |
|
| | |acc_norm|45.15|± | 3.48| |
|
|agieval_sat_math | 0|acc |35.45|± | 3.23| |
|
| | |acc_norm|34.09|± | 3.20| |
|
|
|
Average: 44.35% |
|
|
|
### GPT4All |
|
| Task |Version| Metric |Value| |Stderr| |
|
|-------------|------:|--------|----:|---|-----:| |
|
|arc_challenge| 0|acc |63.82|± | 1.40| |
|
| | |acc_norm|64.76|± | 1.40| |
|
|arc_easy | 0|acc |85.90|± | 0.71| |
|
| | |acc_norm|82.32|± | 0.78| |
|
|boolq | 1|acc |87.61|± | 0.58| |
|
|hellaswag | 0|acc |67.39|± | 0.47| |
|
| | |acc_norm|85.36|± | 0.35| |
|
|openbookqa | 0|acc |38.80|± | 2.18| |
|
| | |acc_norm|48.80|± | 2.24| |
|
|piqa | 0|acc |83.03|± | 0.88| |
|
| | |acc_norm|84.17|± | 0.85| |
|
|winogrande | 0|acc |78.93|± | 1.15| |
|
|
|
Average: 75.99% |
|
|
|
### TruthfulQA |
|
| Task |Version|Metric|Value| |Stderr| |
|
|-------------|------:|------|----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |50.80|± | 1.75| |
|
| | |mc2 |67.23|± | 1.49| |
|
|
|
Average: 67.23% |
|
|
|
### Bigbench |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------------------------|------:|---------------------|----:|---|-----:| |
|
|bigbench_causal_judgement | 0|multiple_choice_grade|54.74|± | 3.62| |
|
|bigbench_date_understanding | 0|multiple_choice_grade|68.29|± | 2.43| |
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|39.53|± | 3.05| |
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|22.28|± | 2.20| |
|
| | |exact_str_match |12.26|± | 1.73| |
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|32.80|± | 2.10| |
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.00|± | 1.59| |
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|59.00|± | 2.84| |
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|45.60|± | 2.23| |
|
|bigbench_navigate | 0|multiple_choice_grade|51.10|± | 1.58| |
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.10|± | 1.02| |
|
|bigbench_ruin_names | 0|multiple_choice_grade|52.68|± | 2.36| |
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|33.57|± | 1.50| |
|
|bigbench_snarks | 0|multiple_choice_grade|71.27|± | 3.37| |
|
|bigbench_sports_understanding | 0|multiple_choice_grade|74.54|± | 1.39| |
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|40.00|± | 1.55| |
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|21.52|± | 1.16| |
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.86|± | 0.94| |
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|59.00|± | 2.84| |
|
|
|
Average: 46.55% |
|
|
|
Average score: 58.53% |
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
models: |
|
- model: cstr/Spaetzle-v84-7b |
|
# no parameters necessary for base model |
|
- model: cstr/Spaetzle-v80-7b |
|
parameters: |
|
density: 0.65 |
|
weight: 0.2 |
|
- model: cstr/Spaetzle-v79-7b |
|
parameters: |
|
density: 0.65 |
|
weight: 0.2 |
|
- model: cstr/Spaetzle-v81-7b |
|
parameters: |
|
density: 0.65 |
|
weight: 0.2 |
|
- model: cstr/Spaetzle-v71-7b |
|
parameters: |
|
density: 0.65 |
|
weight: 0.2 |
|
merge_method: dare_ties |
|
base_model: cstr/Spaetzle-v84-7b |
|
parameters: |
|
int8_mask: true |
|
dtype: bfloat16 |
|
random_seed: 0 |
|
tokenizer_source: base |
|
``` |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
!pip install -qU transformers accelerate |
|
|
|
from transformers import AutoTokenizer |
|
import transformers |
|
import torch |
|
|
|
model = "cstr/Spaetzle-v85-7b" |
|
messages = [{"role": "user", "content": "What is a large language model?"}] |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
) |
|
|
|
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
``` |