Model Description
This is an experiment to test merging 14 models using DARE TIES 🦙
We first merge 14 models to produce EmbeddedLLM/Mistral-7B-Merge-14-v0.3, which is then merged again with Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp using Gradient SLERP. The result is a model that performs quite well but may require further instruction fine-tuning.
Open LLM Leaderboard
Average | 71.19 |
---|---|
ARC | 66.81 |
HellaSwag | 86.15 |
MMLU | 65.10 |
TruthfulQA | 58.25 |
Winogrande | 80.03 |
GSM8K | 70.81 |
Chat Template
Either ChatML or Llama-2 chat template.
Merge Configuration
The merge config file for this model is here:
slices:
- sources:
- model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
layer_range: [0, 32]
- model: EmbeddedLLM/Mistral-7B-Merge-14-v0.3
layer_range: [0, 32]
merge_method: slerp
base_model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5 # fallback for rest of tensors
tokenizer_source: base
embed_slerp: true
dtype: bfloat16
- Downloads last month
- 1,994