Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,62 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
license_name: tongyi-qianwen
|
4 |
-
license_link: LICENSE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
3 |
license_name: tongyi-qianwen
|
4 |
+
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
|
5 |
+
tags:
|
6 |
+
- merge
|
7 |
+
- mergekit
|
8 |
+
- qwen2
|
9 |
+
- chat
|
10 |
+
- conversational
|
11 |
+
language:
|
12 |
+
- en
|
13 |
+
- chi
|
14 |
+
library_name: transformers
|
15 |
---
|
16 |
+
# Qwen1.5-124B-Chat-Merge
|
17 |
+
**--This is a 124b frankenmerge of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) created by interleaving layers of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) with itself using mergekit.--**
|
18 |
+
|
19 |
+
*Inspired by other frankenmerge models like [**goliath-120b**](https://huggingface.co/alpindale/goliath-120b) and [**miqu-1-120b**](https://huggingface.co/wolfram/miqu-1-120b)*
|
20 |
+
|
21 |
+
**-Quantize**
|
22 |
+
|
23 |
+
*Coming soon...*
|
24 |
+
|
25 |
+
**-Merge Configuration**
|
26 |
+
|
27 |
+
This yaml below:
|
28 |
+
```yaml
|
29 |
+
dtype: float16
|
30 |
+
merge_method: passthrough
|
31 |
+
slices:
|
32 |
+
- sources:
|
33 |
+
- layer_range: [0, 20]
|
34 |
+
model: Qwen/Qwen1.5-72B-Chat
|
35 |
+
- sources:
|
36 |
+
- layer_range: [10, 30]
|
37 |
+
model: Qwen/Qwen1.5-72B-Chat
|
38 |
+
- sources:
|
39 |
+
- layer_range: [20, 40]
|
40 |
+
model: Qwen/Qwen1.5-72B-Chat
|
41 |
+
- sources:
|
42 |
+
- layer_range: [30, 50]
|
43 |
+
model: Qwen/Qwen1.5-72B-Chat
|
44 |
+
- sources:
|
45 |
+
- layer_range: [40, 60]
|
46 |
+
model: Qwen/Qwen1.5-72B-Chat
|
47 |
+
- sources:
|
48 |
+
- layer_range: [50, 70]
|
49 |
+
model: Qwen/Qwen1.5-72B-Chat
|
50 |
+
- sources:
|
51 |
+
- layer_range: [60, 80]
|
52 |
+
model: Qwen/Qwen1.5-72B-Chat
|
53 |
+
```
|
54 |
+
**-Performance**
|
55 |
+
|
56 |
+
* Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be entirely accurate.
|
57 |
+
|
58 |
+
It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence.
|
59 |
+
|
60 |
+
**-Thanks**
|
61 |
+
* 1.The tool used to merge this model [mergekit](https://github.com/arcee-ai/mergekit)
|
62 |
+
* 2.Qwen team for the excellent base models.
|