Achieve the Optimal Merged Model by Using One Basic Model and Two Fine-tuned Models!

What is the best way to merge one base model and two fine-tuned models?

This might be the best answer at the present stage!

This is not a whim release, but the optimal result of countless merging experiments!

Here is the formula for the previous generation:

models:  
  - model: Qwen/Qwen2.5-7B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-7B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-7B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base

It was widely used in the merging process of the previous generation of models.

However, there are some deficiencies:

1.There is relatively little retention of knowledge of the basic model.

2.The mathematical and coding abilities have declined.

And here is the formula for this generation:

models:
  - model: Qwen/Qwen2.5-7B-instruct
    parameters:
      density: 1 
      weight: 1
      lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  lambda: 0.9
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della

models:
  - model: Qwen/Qwen2.5-7B-instruct-1M
    parameters:
      density: 1 
      weight: 1
      lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  lambda: 0.9
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della-1M

models:
  - model: Qwen/Qwen2.5-7B-instruct
    parameters:
      density: 1 
      weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties

models:
  - model: Qwen/Qwen2.5-7B-instruct-1M
    parameters:
      density: 1 
      weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties-1M

merge_method: model_stock
base_model: Qwen/Qwen2.5-7B
models:
  - model: mergekit-community/Qwen2.5-7B-della
  - model: mergekit-community/Qwen2.5-7B-della-1M
  - model: mergekit-community/Qwen2.5-7B-ties
  - model: mergekit-community/Qwen2.5-7B-ties-1M
  - model: Qwen/Qwen2.5-7B-instruct-1M
  - model: Qwen/Qwen2.5-7B-instruct
tokenizer_source: base
int8_mask: true
normalize: true
dtype: float16

Except for a slight decrease in instruction following, significant improvements have been achieved in all other aspects.

This formula will also be used in the development of the next generation of YOYO models.

YOYO-AI not only releases merged models with excellent performance but also publishes a complete and high-quality model merging formula, hoping to promote the progress of model merging technology in the open-source community with this!

YOYO-AI
/

Qwen2.5-7B-YOYO-super

Achieve the Optimal Merged Model by Using One Basic Model and Two Fine-tuned Models!

This might be the best answer at the present stage!

And here is the formula for this generation:

If you can use this formula when merging models, it will be the greatest support for YOYO-AI!

Model tree for YOYO-AI/Qwen2.5-7B-YOYO-super

Collection including YOYO-AI/Qwen2.5-7B-YOYO-super

Qwen2.5-YOYO-super