Replaced Zephyr by Airoboros 2.2 and OpenOrca by SynthIA in the mix, the reason why is to see if using merged Mistral models using all the same prompt format would be a better step or not.
Description
This repo contains fp16 files of Mistral-11B-SynthIAirOmniMix.
Model used
- SynthIA-7B-v1.5
- Mistral-7B-v0.1-Open-Platypus
- CollectiveCognition-v1.1-Mistral-7B
- airoboros-mistral2.2-7b
Prompt template
3 out of 4 models use the same prompting format in this merge.
The best one should be this one, since Zephyr and OpenOrca is out of the merge:
(SYSTEM: {context}) - Not mandatory
USER: {prompt}
ASSISTANT:
But this one (maybe) work too:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
The secret sauce
Mistral-11B-SynthIAOpenPlatypus :
slices:
- sources:
- model: "/content/drive/MyDrive/SynthIA-7B-v1.5-bf16"
layer_range: [0, 24]
- sources:
- model: akjindal53244/Mistral-7B-v0.1-Open-Platypus
layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
Mistral-11B-CC-Airo :
slices:
- sources:
- model: "/content/drive/MyDrive/CC-v1.1-7B-bf16"
layer_range: [0, 24]
- sources:
- model: "/content/drive/MyDrive/Mistral-7B-Airoboros-2.2-bf16"
layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
Mistral-11B-SynthIAirOmniMix :
slices:
- sources:
- model: Mistral-11B-SynthIAOpenPlatypus
layer_range: [0, 48]
- model: Mistral-11B-CC-Airo
layer_range: [0, 48]
merge_method: slerp
base_model: Mistral-11B-OpenOrcaPlatypus
parameters:
t:
- filter: lm_head
value: [0.75]
- filter: embed_tokens
value: [0.75]
- filter: self_attn
value: [0.75, 0.25]
- filter: mlp
value: [0.25, 0.75]
- filter: layernorm
value: [0.5, 0.5]
- filter: modelnorm
value: [0.75]
- value: 0.5 # fallback for rest of tensors
dtype: bfloat16
I use mergekit for all the manipulation told here.
Some scoring I done myself
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 0.5410 | ± | 0.0146 |
acc_norm | 0.5640 | ± | 0.0145 | ||
arc_easy | 0 | acc | 0.8228 | ± | 0.0078 |
acc_norm | 0.8068 | ± | 0.0081 | ||
hellaswag | 0 | acc | 0.6274 | ± | 0.0048 |
acc_norm | 0.8167 | ± | 0.0039 | ||
piqa | 0 | acc | 0.8052 | ± | 0.0092 |
acc_norm | 0.8232 | ± | 0.0089 | ||
truthfulqa_mc | 1 | mc1 | 0.3905 | ± | 0.0171 |
mc2 | 0.5592 | ± | 0.0155 | ||
winogrande | 0 | acc | 0.7364 | ± | 0.0124 |
Others
Special thanks to Sushi, Henky for the machine he give me for big task, and Charles Goddard for his amazing tool.
If you want to support me, you can here.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 54.56 |
ARC (25-shot) | 62.46 |
HellaSwag (10-shot) | 83.13 |
MMLU (5-shot) | 63.47 |
TruthfulQA (0-shot) | 55.69 |
Winogrande (5-shot) | 76.4 |
GSM8K (5-shot) | 11.9 |
DROP (3-shot) | 28.88 |
- Downloads last month
- 738
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.