mlabonne commited on
Commit
f95fcf2
1 Parent(s): 00ee83e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -12
README.md CHANGED
@@ -1,27 +1,45 @@
1
  ---
2
- base_model:
3
- - Qwen/Qwen2.5-32B-Instruct
 
 
 
 
4
  library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
 
 
9
  ---
10
- # merge2
11
 
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
13
 
14
- ## Merge Details
15
- ### Merge Method
16
 
17
- This model was merged using the passthrough merge method.
18
 
19
- ### Models Merged
 
 
 
 
 
 
 
20
 
21
- The following models were included in the merge:
22
- * [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
 
 
23
 
24
- ### Configuration
 
 
 
25
 
26
  The following YAML configuration was used to produce this model:
27
 
@@ -178,3 +196,28 @@ slices:
178
  merge_method: passthrough
179
  dtype: bfloat16
180
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
+ license_name: tongyi-qianwen
4
+ license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
  library_name: transformers
9
  tags:
10
  - mergekit
11
  - merge
12
+ - lazymergekit
13
+ base_model:
14
+ - Qwen/Qwen2.5-32B-Instruct
15
  ---
 
16
 
17
+ # BigQwen2.5-Echo-47B-Instruct
18
+
19
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/98GiKtmH1AtHHbIbOUH4Y.jpeg)
20
 
21
+ BigQwen2.5-Echo-47B-Instruct is a [Qwen/Qwen2-32B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
 
22
 
23
+ ## 🔉 Echo Merge
24
 
25
+ I've tried a more gradual approach with a **distributed repetition pattern**. Instead of replicating blocks of 8 or more layers, I'm replicating individual layers in these blocks:
26
+ - First 8 layers: No replication
27
+ - Next 8 layers: Replicate 2 layers (first one, middle one)
28
+ - Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
29
+ - Next 8 layers: Replicate 8 layers (all of them)
30
+ - Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
31
+ - Next 8 layers: Replicate 2 layers (first one, middle one)
32
+ - First 8 layers: No replication
33
 
34
+ I used this string to visualize it, where 0 are original layers and 1 duplicated ones (the order doesn't matter):
35
+ ```
36
+ 00000000 1000010000 100100100100 1010101010101010 1010101010101010 100100100100 1000010000 00000000
37
+ ```
38
 
39
+ The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
40
+ The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
41
+
42
+ ## 🧩 Configuration
43
 
44
  The following YAML configuration was used to produce this model:
45
 
 
196
  merge_method: passthrough
197
  dtype: bfloat16
198
  ```
199
+
200
+ ## 💻 Usage
201
+
202
+ ```python
203
+ !pip install -qU transformers accelerate
204
+
205
+ from transformers import AutoTokenizer
206
+ import transformers
207
+ import torch
208
+
209
+ model = "mlabonne/BigQwen2.5-Echo-47B-Instruct"
210
+ messages = [{"role": "user", "content": "What is a large language model?"}]
211
+
212
+ tokenizer = AutoTokenizer.from_pretrained(model)
213
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
214
+ pipeline = transformers.pipeline(
215
+ "text-generation",
216
+ model=model,
217
+ torch_dtype=torch.float16,
218
+ device_map="auto",
219
+ )
220
+
221
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
222
+ print(outputs[0]["generated_text"])
223
+ ```