YOYO-AI
/

Qwen2.5-14B-1M-YOYO-V3

@@ -16,4 +16,180 @@ tags:
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
-# Qwen2.5-14B-1M-YOYO-V3

 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
+I’m excited to introduce my third-generation model:
+# Qwen2.5-14B-1M-YOYO-V3
+This time, I’m not only releasing the model but also sharing some model merging techniques, which might be even more valuable than the model itself.
+Let’s start by looking at the initial merge configuration (YAML):
+```yaml
+merge_method: model_stock
+base_model: Qwen/Qwen2.5-14B
+models:
+  - model: Qwen/Qwen2.5-14B-instruct
+  - model: Qwen/Qwen2.5-14B-instruct-1M
+dtype: bfloat16
+```
+Seems straightforward, right? But the merged model occasionally suffered from **uncontrollable outputs**, likely due to the large divergence between the instruction-tuned models and the base model.
+To address this, I first tried integrating a fine-tuned model with smaller divergence from the base model, like **Virtuoso-Small-v2**.
+This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
+```yaml
+merge_method: model_stock
+base_model: Qwen/Qwen2.5-14B
+models:
+  - model: Qwen/Qwen2.5-14B-instruct
+  - model: Qwen/Qwen2.5-14B-instruct-1M
+  - model: arcee-ai/Virtuoso-Small-v2
+dtype: bfloat16
+name: Qwen2.5-14B-YOYO-latest-V2
+```
+This reduced runaway outputs but still left the model unstable.
+Through experimentation, I found that merging **"high-divergence"** models into **"low-divergence"** models (close to the base) using the `della` method produced more stable and performant result
+## Key models used:
+*1. Low-divergence, high-performance models:*
+   - Virtuoso-Small-v2
+   - Blossom-V6-14B
+*2. High-divergence, instruction-focused models:*
+   - Qwen2.5-14B-instruct
+   - Qwen2.5-14B-instruct-1M
+## DELLA Merge Configuration:
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: arcee-ai/Virtuoso-Small-v2
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-YOYO-della1
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: arcee-ai/Virtuoso-Small-v2
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-YOYO-della2
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Azure99/Blossom-V6-14B
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-YOYO-della3
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Azure99/Blossom-V6-14B
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-YOYO-della3
+```
+This approach yielded four variants:
+- `Qwen2.5-14B-YOYO-della1`
+- `Qwen2.5-14B-YOYO-della2`
+- `Qwen2.5-14B-YOYO-della3`
+- `Qwen2.5-14B-YOYO-della4`
+## Base Model:
+To enhance base model roleplay and creative writing capabilities, I applied the same strategy:
+```yaml
+models:
+  - model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Qwen/Qwen2.5-14B
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: EVA-Qwen2.5-14B-base
+```
+Next, I extended the context length using the SCE method:
+```yaml
+merge_method: sce
+models:
+  - model: EVA-Qwen2.5-14B-base
+base_model: Qwen/Qwen2.5-14B-Instruct-1M
+parameters:
+  select_topk: 1
+dtype: bfloat16
+tokenizer_source: base
+normalize: true
+int8_mask: true
+name: Qwen2.5-14B-pro
+```
+## Final Merge Step:
+```yaml
+merge_method: model_stock
+base_model: Qwen2.5-14B-pro
+models:
+  - model: Qwen2.5-14B-YOYO-della1
+  - model: Qwen2.5-14B-YOYO-della2
+  - model: Qwen2.5-14B-YOYO-della3
+  - model: Qwen2.5-14B-YOYO-della4
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+name: Qwen2.5-14B-1M-YOYO-V3
+```
+Feel free to adapt these strategies for your own merging experiments! 🚀