Nohobby commited on
Commit
12f2a16
·
verified ·
1 Parent(s): f5fbd9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +233 -15
README.md CHANGED
@@ -1,30 +1,249 @@
1
  ---
2
- base_model:
3
- - d-rang-d/MS3-megamerge
4
- - unsloth/Mistral-Small-24B-Instruct-2501
5
  library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
9
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
- # merge
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
 
 
 
 
14
 
15
  ## Merge Details
16
- ### Merge Method
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- This model was merged using the [Linear DELLA](https://arxiv.org/abs/2406.11617) merge method using [d-rang-d/MS3-megamerge](https://huggingface.co/d-rang-d/MS3-megamerge) as a base.
19
 
20
- ### Models Merged
21
 
22
- The following models were included in the merge:
23
- * [unsloth/Mistral-Small-24B-Instruct-2501](https://huggingface.co/unsloth/Mistral-Small-24B-Instruct-2501)
24
 
25
- ### Configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ```yaml
30
  dtype: bfloat16
@@ -63,5 +282,4 @@ models:
63
  - filter: down_proj
64
  value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
65
  - value: 1
66
-
67
- ```
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
  library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
9
+ base_model:
10
+ - unsloth/Mistral-Small-24B-Base-2501
11
+ - unsloth/Mistral-Small-24B-Instruct-2501
12
+ - trashpanda-org/MS-24B-Instruct-Mullein-v0
13
+ - trashpanda-org/Llama3-24B-Mullein-v1
14
+ - ArliAI/Mistral-Small-24B-ArliAI-RPMax-v1.4
15
+ - TheDrummer/Cydonia-24B-v2
16
+ - estrogen/MS2501-24b-Ink-apollo-ep2
17
+ - huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated
18
+ - ToastyPigeon/ms3-roselily-rp-v2
19
+ - PocketDoc/Dans-DangerousWinds-V1.1.1-24b
20
+ - ReadyArt/Forgotten-Safeword-24B-V2.2
21
+ - PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
22
+ - Undi95/MistralThinker-e2
23
+ - lemonilia/Mistral-Small-3-Reasoner-s1
24
+ - arcee-ai/Arcee-Blitz
25
+ - SicariusSicariiStuff/Redemption_Wind_24B
26
  ---
27
+ ***
28
+ ## Tantum
29
+
30
+ >Everything is edible if you are brave enough
31
+
32
+ ![Так и живём](https://files.catbox.moe/ergq6n.png)
33
+
34
+ ### Overview
35
+
36
+ It's kind of hard to judge a 24B model after using a 70B for a while. From some tests, I think it might be better than my ms-22B and qwen-32B merges.
37
+
38
+ It has some prose, some character adherence, and... `<think>` tags! It will consistently think if you add `<think>` tag as prefill, tho I think it will obviously not think as well as an actual thinking model distill.
39
+
40
+ **Settings:**
41
+
42
+ Samplers: [Weird preset](https://files.catbox.moe/ccwmca.json) | [Mullein preset](https://files.catbox.moe/0pkv2j.json)
43
+
44
+ Prompt format: Mistral-V7 (?)
45
+
46
+ ChatML and Llama3 give better results imo. In the case of ChatML, there are Dans-PersonalityEngine and Redemption-Wind models that have been trained on it. But Llama3? No clue.
47
 
48
+ I use [this](https://files.catbox.moe/daluze.json) lorebook for all chats instead of a system prompt for mistal models.
49
+
50
+ ### Quants
51
+
52
+ [5_K_S](https://huggingface.co/Nohobby/ignore_MS3-test-Q5_K_S-GGUF/resolve/main/ignore_ms3-test-q5_k_s.gguf?download=true)
53
+
54
+ ***
55
 
56
  ## Merge Details
57
+ ### Merging steps
58
+
59
+ ## MS3-test-Merge-1
60
+
61
+ ```yaml
62
+ models:
63
+ - model: unsloth/Mistral-Small-24B-Base-2501
64
+ - model: unsloth/Mistral-Small-24B-Instruct-2501+ToastyPigeon/new-ms-rp-test-ws
65
+ parameters:
66
+ select_topk:
67
+ - value: [0.05, 0.03, 0.02, 0.02, 0.01]
68
+ - model: unsloth/Mistral-Small-24B-Instruct-2501+estrogen/MS2501-24b-Ink-ep2-adpt
69
+ parameters:
70
+ select_topk: 0.1
71
+ - model: trashpanda-org/MS-24B-Instruct-Mullein-v0
72
+ parameters:
73
+ select_topk: 0.4
74
+ base_model: unsloth/Mistral-Small-24B-Base-2501
75
+ merge_method: sce
76
+ parameters:
77
+ int8_mask: true
78
+ rescale: true
79
+ normalize: true
80
+ dtype: bfloat16
81
+ tokenizer_source: base
82
+ ```
83
+
84
+ ```yaml
85
+ dtype: bfloat16
86
+ tokenizer_source: base
87
+ merge_method: della_linear
88
+ parameters:
89
+ density: 0.55
90
+ base_model: Step1
91
+ models:
92
+ - model: unsloth/Mistral-Small-24B-Instruct-2501
93
+ parameters:
94
+ weight:
95
+ - filter: v_proj
96
+ value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
97
+ - filter: o_proj
98
+ value: [1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1]
99
+ - filter: up_proj
100
+ value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
101
+ - filter: gate_proj
102
+ value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
103
+ - filter: down_proj
104
+ value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
105
+ - value: 0
106
+ - model: Step1
107
+ parameters:
108
+ weight:
109
+ - filter: v_proj
110
+ value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
111
+ - filter: o_proj
112
+ value: [0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0]
113
+ - filter: up_proj
114
+ value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
115
+ - filter: gate_proj
116
+ value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
117
+ - filter: down_proj
118
+ value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
119
+ - value: 1
120
 
121
+ ```
122
 
123
+ Some early MS3 merge. Not really worth using on its own. Just added it for fun.
124
 
125
+ ## RP-half1
 
126
 
127
+ ```yaml
128
+ models:
129
+ - model: ArliAI/Mistral-Small-24B-ArliAI-RPMax-v1.4
130
+ parameters:
131
+ weight: 0.2
132
+ density: 0.7
133
+ - model: trashpanda-org/Llama3-24B-Mullein-v1
134
+ parameters:
135
+ weight: 0.2
136
+ density: 0.7
137
+ - model: TheDrummer/Cydonia-24B-v2
138
+ parameters:
139
+ weight: 0.2
140
+ density: 0.7
141
+ merge_method: della_linear
142
+ base_model: Nohobby/MS3-test-Merge-1
143
+ parameters:
144
+ epsilon: 0.2
145
+ lambda: 1.1
146
+ dtype: bfloat16
147
+ tokenizer:
148
+ source: base
149
+ ```
150
 
151
+ ## RP-half2
152
+
153
+ ```yaml
154
+ base_model: Nohobby/MS3-test-Merge-1
155
+ parameters:
156
+ epsilon: 0.05
157
+ lambda: 0.9
158
+ int8_mask: true
159
+ rescale: true
160
+ normalize: false
161
+ dtype: bfloat16
162
+ tokenizer:
163
+ source: base
164
+ merge_method: della
165
+ models:
166
+ - model: estrogen/MS2501-24b-Ink-apollo-ep2
167
+ parameters:
168
+ weight: [0.1, -0.01, 0.1, -0.02, 0.1]
169
+ density: [0.6, 0.4, 0.5, 0.4, 0.6]
170
+ - model: huihui-ai/Mistral-Small-24B-Instruct-2501-abliterated
171
+ parameters:
172
+ weight: [0.02, -0.01, 0.02, -0.02, 0.01]
173
+ density: [0.45, 0.55, 0.45, 0.55, 0.45]
174
+ - model: ToastyPigeon/ms3-roselily-rp-v2
175
+ parameters:
176
+ weight: [0.01, -0.02, 0.02, -0.025, 0.01]
177
+ density: [0.45, 0.65, 0.45, 0.65, 0.45]
178
+ - model: PocketDoc/Dans-DangerousWinds-V1.1.1-24b
179
+ parameters:
180
+ weight: [0.1, -0.01, 0.1, -0.02, 0.1]
181
+ density: [0.6, 0.4, 0.5, 0.4, 0.6]
182
+ ```
183
+
184
+ ## RP-whole
185
+
186
+ ```yaml
187
+ base_model: ReadyArt/Forgotten-Safeword-24B-V2.2
188
+ merge_method: model_stock
189
+ dtype: bfloat16
190
+ models:
191
+ - model: mergekit-community/MS3-RP-half1
192
+ - model: mergekit-community/MS3-RP-RP-half2
193
+ ```
194
+
195
+ ## INT
196
+
197
+ ```yaml
198
+ merge_method: della_linear
199
+ dtype: bfloat16
200
+ parameters:
201
+ normalize: true
202
+ int8_mask: true
203
+ tokenizer:
204
+ source: base
205
+ base_model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
206
+ models:
207
+ - model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
208
+ parameters:
209
+ density: 0.55
210
+ weight: 1
211
+ - model: Undi95/MistralThinker-e2
212
+ parameters:
213
+ density: 0.55
214
+ weight: 1
215
+ - model: d-rang-d/ignore_MS3-Reasoner-mergekit
216
+ parameters:
217
+ density: 0.55
218
+ weight: 1
219
+ - model: arcee-ai/Arcee-Blitz
220
+ parameters:
221
+ density: 0.55
222
+ weight: 1
223
+ ```
224
+
225
+ ## Tantumv00
226
+
227
+ ```yaml
228
+ output_base_model: "SicariusSicariiStuff/Redemption_Wind_24B"
229
+ output_dtype: "bfloat16"
230
+ finetune_merge:
231
+ - { "model": "mergekit-community/MS3-INT", "base": "unsloth/Mistral-Small-24B-Instruct-2501", "alpha": 1.0, "is_input": true }
232
+ - { "model": "mergekit-community/MS-RP-whole", "base": "unsloth/Mistral-Small-24B-Instruct-2501", "alpha": 0.7, "is_output": true }
233
+ output_dir: "output_model"
234
+ device: "cpu"
235
+ clean_cache: false
236
+ cache_dir: "cache"
237
+ storage_dir: "storage"
238
+ ```
239
+
240
+ Doesn't look like a mergekit recipe, right? Well, it's not. It's for a standalone merge tool: https://github.com/54rt1n/shardmerge
241
+
242
+ If you want to use it for something non-qwen you can replace index.py with [this](https://files.catbox.moe/bgxmuz.py) and writer.py with [that](https://files.catbox.moe/ewww39.py). A much better solution is possible, ofc, but I'm a dumdum and can't code. The creator knows about this issue and will fix it... Someday, I guess.
243
+
244
+ You also need to know that this thing is *really* slow, and it took me 5 hours to cram 3 24B models together.
245
+
246
+ ## Tantumv01
247
 
248
  ```yaml
249
  dtype: bfloat16
 
282
  - filter: down_proj
283
  value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
284
  - value: 1
285
+ ```