Achieve the Optimal Merged Model by Using One Basic Model and Two Fine-tuned Models!
What is the best way to merge one base model and two fine-tuned models?
This might be the best answer at the present stage!
This is not a whim release, but the optimal result of countless merging experiments!
Here is the formula for the previous generation:
models:
- model: Qwen/Qwen2.5-7B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-7B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
It was widely used in the merging process of the previous generation of models.
However, there are some deficiencies:
1.There is relatively little retention of knowledge of the basic model.
2.The mathematical and coding abilities have declined.
And here is the formula for this generation:
models:
- model: Qwen/Qwen2.5-7B-instruct
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della
models:
- model: Qwen/Qwen2.5-7B-instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della-1M
models:
- model: Qwen/Qwen2.5-7B-instruct
parameters:
density: 1
weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties
models:
- model: Qwen/Qwen2.5-7B-instruct-1M
parameters:
density: 1
weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties-1M
merge_method: model_stock
base_model: Qwen/Qwen2.5-7B
models:
- model: mergekit-community/Qwen2.5-7B-della
- model: mergekit-community/Qwen2.5-7B-della-1M
- model: mergekit-community/Qwen2.5-7B-ties
- model: mergekit-community/Qwen2.5-7B-ties-1M
- model: Qwen/Qwen2.5-7B-instruct-1M
- model: Qwen/Qwen2.5-7B-instruct
tokenizer_source: base
int8_mask: true
normalize: true
dtype: float16
Except for a slight decrease in instruction following, significant improvements have been achieved in all other aspects.
This formula will also be used in the development of the next generation of YOYO models.
YOYO-AI not only releases merged models with excellent performance but also publishes a complete and high-quality model merging formula, hoping to promote the progress of model merging technology in the open-source community with this!
If you can use this formula when merging models, it will be the greatest support for YOYO-AI!
- Downloads last month
- 11