DavidAU commited on
Commit
7a063b9
·
verified ·
1 Parent(s): bac8ea9

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: []
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+
8
+ ---
9
+ # Bagel-Multiverse-20B-exp40-3-bf16
10
+
11
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
+
13
+ ## Merge Details
14
+ ### Merge Method
15
+
16
+ This model was merged using the passthrough merge method.
17
+
18
+ ### Models Merged
19
+
20
+ The following models were included in the merge:
21
+ * E:/Bagel-Multiverse-20B-bg
22
+
23
+ ### Configuration
24
+
25
+ The following YAML configuration was used to produce this model:
26
+
27
+ ```yaml
28
+ # Six splits plus "end game
29
+ # "D" starts at plus .1 VS D/O proj.
30
+ # 40 plus.
31
+ # Formula 3
32
+
33
+ slices:
34
+ - sources:
35
+ - model: E:/Bagel-Multiverse-20B-bg
36
+ layer_range: [0, 48]
37
+
38
+ # D/G Block Settings.
39
+ # split 1: .11 to .61 [.04 G] [4 layers]
40
+ # split 2: .15 to .41 [.08 G] [4 layers]
41
+ # split 3: .19 to .35 [.23 G] [12 layers]
42
+ # split 4: .11 to .41 [.244 G] [4 layers]
43
+ # split 5: .15 to .3 [.248 G] [4 layers]
44
+ # spilt 6: .19 to .3 [.256 G] [7 layers]
45
+ # final [D/G]: .33,.44,.55,.66,.77/.88 [5 layers]
46
+
47
+ # conc layers
48
+ # split 1
49
+
50
+ - sources:
51
+ - model: E:/Bagel-Multiverse-20B-bg
52
+ layer_range: [48,49]
53
+ parameters:
54
+ scale:
55
+ - filter: o_proj
56
+ value: 0.01
57
+ - filter: down_proj
58
+ value: 0.01
59
+ - value: 0.11
60
+ - sources:
61
+ - model: E:/Bagel-Multiverse-20B-bg
62
+ layer_range: [48,49]
63
+ parameters:
64
+ scale:
65
+ - filter: o_proj
66
+ value: 0.02
67
+ - filter: down_proj
68
+ value: 0.02
69
+ - value: 0.12
70
+ - sources:
71
+ - model: E:/Bagel-Multiverse-20B-bg
72
+ layer_range: [48,49]
73
+ parameters:
74
+ scale:
75
+ - filter: o_proj
76
+ value: 0.03
77
+ - filter: down_proj
78
+ value: 0.03
79
+ - value: 0.13
80
+
81
+ - sources:
82
+ - model: E:/Bagel-Multiverse-20B-bg
83
+ layer_range: [48,49]
84
+ parameters:
85
+ scale:
86
+ - filter: o_proj
87
+ value: 0.04
88
+ - filter: down_proj
89
+ value: 0.04
90
+ - value: 0.61
91
+
92
+ # split 2, SURGE D THEN D drop .46, continues @ D .15 (from .13)
93
+
94
+ - sources:
95
+ - model: E:/Bagel-Multiverse-20B-bg
96
+ layer_range: [48,49]
97
+ parameters:
98
+ scale:
99
+ - filter: o_proj
100
+ value: 0.05
101
+ - filter: down_proj
102
+ value: 0.05
103
+ - value: 0.15
104
+ - sources:
105
+ - model: E:/Bagel-Multiverse-20B-bg
106
+ layer_range: [48,49]
107
+ parameters:
108
+ scale:
109
+ - filter: o_proj
110
+ value: 0.06
111
+ - filter: down_proj
112
+ value: 0.06
113
+ - value: 0.16
114
+ - sources:
115
+ - model: E:/Bagel-Multiverse-20B-bg
116
+ layer_range: [48,49]
117
+ parameters:
118
+ scale:
119
+ - filter: o_proj
120
+ value: 0.07
121
+ - filter: down_proj
122
+ value: 0.07
123
+ - value: 0.17
124
+ - sources:
125
+ - model: E:/Bagel-Multiverse-20B-bg
126
+ layer_range: [48,49]
127
+ parameters:
128
+ scale:
129
+ - filter: o_proj
130
+ value: 0.08
131
+ - filter: down_proj
132
+ value: 0.08
133
+ - value: 0.41
134
+
135
+ # split 3, SURGE D to .41, D drop .21 ... follows .17 previous
136
+
137
+ - sources:
138
+ - model: E:/Bagel-Multiverse-20B-bg
139
+ layer_range: [48,49]
140
+ parameters:
141
+ scale:
142
+ - filter: o_proj
143
+ value: 0.09
144
+ - filter: down_proj
145
+ value: 0.09
146
+ - value: 0.19
147
+ - sources:
148
+ - model: E:/Bagel-Multiverse-20B-bg
149
+ layer_range: [48,49]
150
+ parameters:
151
+ scale:
152
+ - filter: o_proj
153
+ value: 0.10
154
+ - filter: down_proj
155
+ value: 0.10
156
+ - value: 0.20
157
+ - sources:
158
+ - model: E:/Bagel-Multiverse-20B-bg
159
+ layer_range: [48,49]
160
+ parameters:
161
+ scale:
162
+ - filter: o_proj
163
+ value: 0.11
164
+ - filter: down_proj
165
+ value: 0.11
166
+ - value: .22
167
+ - sources:
168
+ - model: E:/Bagel-Multiverse-20B-bg
169
+ layer_range: [48,49]
170
+ parameters:
171
+ scale:
172
+ - filter: o_proj
173
+ value: 0.12
174
+ - filter: down_proj
175
+ value: 0.12
176
+ - value: .24
177
+ - sources:
178
+ - model: E:/Bagel-Multiverse-20B-bg
179
+ layer_range: [48,49]
180
+ parameters:
181
+ scale:
182
+ - filter: o_proj
183
+ value: 0.13
184
+ - filter: down_proj
185
+ value: 0.13
186
+ - value: .26
187
+ - sources:
188
+ - model: E:/Bagel-Multiverse-20B-bg
189
+ layer_range: [48,49]
190
+ parameters:
191
+ scale:
192
+ - filter: o_proj
193
+ value: 0.14
194
+ - filter: down_proj
195
+ value: 0.14
196
+ - value: .28
197
+ - sources:
198
+ - model: E:/Bagel-Multiverse-20B-bg
199
+ layer_range: [48,49]
200
+ parameters:
201
+ scale:
202
+ - filter: o_proj
203
+ value: 0.15
204
+ - filter: down_proj
205
+ value: 0.15
206
+ - value: .30
207
+ - sources:
208
+ - model: E:/Bagel-Multiverse-20B-bg
209
+ layer_range: [48,49]
210
+ parameters:
211
+ scale:
212
+ - filter: o_proj
213
+ value: 0.16
214
+ - filter: down_proj
215
+ value: 0.16
216
+ - value: .31
217
+ - sources:
218
+ - model: E:/Bagel-Multiverse-20B-bg
219
+ layer_range: [48,49]
220
+ parameters:
221
+ scale:
222
+ - filter: o_proj
223
+ value: 0.20
224
+ - filter: down_proj
225
+ value: 0.20
226
+ - value: .32
227
+ - sources:
228
+ - model: E:/Bagel-Multiverse-20B-bg
229
+ layer_range: [48,49]
230
+ parameters:
231
+ scale:
232
+ - filter: o_proj
233
+ value: 0.21
234
+ - filter: down_proj
235
+ value: 0.21
236
+ - value: .33
237
+ - sources:
238
+ - model: E:/Bagel-Multiverse-20B-bg
239
+ layer_range: [48,49]
240
+ parameters:
241
+ scale:
242
+ - filter: o_proj
243
+ value: 0.22
244
+ - filter: down_proj
245
+ value: 0.22
246
+ - value: .34
247
+ - sources:
248
+ - model: E:/Bagel-Multiverse-20B-bg
249
+ layer_range: [48,49]
250
+ parameters:
251
+ scale:
252
+ - filter: o_proj
253
+ value: 0.23
254
+ - filter: down_proj
255
+ value: 0.23
256
+ - value: .35
257
+
258
+ # split 4 , NO SURGE D, "D" down drop of .24 ; reverts to .11 (the very first "D" setting )
259
+
260
+ - sources:
261
+ - model: E:/Bagel-Multiverse-20B-bg
262
+ layer_range: [48,49]
263
+ parameters:
264
+ scale:
265
+ - filter: o_proj
266
+ value: 0.24
267
+ - filter: down_proj
268
+ value: 0.24
269
+ - value: 0.11
270
+ - sources:
271
+ - model: E:/Bagel-Multiverse-20B-bg
272
+ layer_range: [48,49]
273
+ parameters:
274
+ scale:
275
+ - filter: o_proj
276
+ value: 0.241
277
+ - filter: down_proj
278
+ value: 0.241
279
+ - value: 0.12
280
+ - sources:
281
+ - model: E:/Bagel-Multiverse-20B-bg
282
+ layer_range: [48,49]
283
+ parameters:
284
+ scale:
285
+ - filter: o_proj
286
+ value: 0.242
287
+ - filter: down_proj
288
+ value: 0.243
289
+ - value: 0.13
290
+ - sources:
291
+ - model: E:/Bagel-Multiverse-20B-bg
292
+ layer_range: [48,49]
293
+ parameters:
294
+ scale:
295
+ - filter: o_proj
296
+ value: 0.244
297
+ - filter: down_proj
298
+ value: 0.244
299
+ - value: 0.61
300
+
301
+ # split 5, D Surge to .61, drop to .15 (following .13)
302
+
303
+ - sources:
304
+ - model: E:/Bagel-Multiverse-20B-bg
305
+ layer_range: [48,49]
306
+ parameters:
307
+ scale:
308
+ - filter: o_proj
309
+ value: 0.245
310
+ - filter: down_proj
311
+ value: 0.245
312
+ - value: 0.15
313
+ - sources:
314
+ - model: E:/Bagel-Multiverse-20B-bg
315
+ layer_range: [48,49]
316
+ parameters:
317
+ scale:
318
+ - filter: o_proj
319
+ value: 0.246
320
+ - filter: down_proj
321
+ value: 0.246
322
+ - value: 0.16
323
+ - sources:
324
+ - model: E:/Bagel-Multiverse-20B-bg
325
+ layer_range: [48,49]
326
+ parameters:
327
+ scale:
328
+ - filter: o_proj
329
+ value: 0.247
330
+ - filter: down_proj
331
+ value: 0.247
332
+ - value: 0.17
333
+ - sources:
334
+ - model: E:/Bagel-Multiverse-20B-bg
335
+ layer_range: [48,49]
336
+ parameters:
337
+ scale:
338
+ - filter: o_proj
339
+ value: 0.248
340
+ - filter: down_proj
341
+ value: 0.248
342
+ - value: 0.41
343
+
344
+ # split 6, D surge to .41 , then follows .17
345
+
346
+ - sources:
347
+ - model: E:/Bagel-Multiverse-20B-bg
348
+ layer_range: [48,49]
349
+ parameters:
350
+ scale:
351
+ - filter: o_proj
352
+ value: 0.249
353
+ - filter: down_proj
354
+ value: 0.249
355
+ - value: 0.19
356
+ - sources:
357
+ - model: E:/Bagel-Multiverse-20B-bg
358
+ layer_range: [48,49]
359
+ parameters:
360
+ scale:
361
+ - filter: o_proj
362
+ value: 0.250
363
+ - filter: down_proj
364
+ value: 0.250
365
+ - value: 0.20
366
+ - sources:
367
+ - model: E:/Bagel-Multiverse-20B-bg
368
+ layer_range: [48,49]
369
+ parameters:
370
+ scale:
371
+ - filter: o_proj
372
+ value: 0.251
373
+ - filter: down_proj
374
+ value: 0.251
375
+ - value: .22
376
+ - sources:
377
+ - model: E:/Bagel-Multiverse-20B-bg
378
+ layer_range: [48,49]
379
+ parameters:
380
+ scale:
381
+ - filter: o_proj
382
+ value: 0.252
383
+ - filter: down_proj
384
+ value: 0.252
385
+ - value: .24
386
+ - sources:
387
+ - model: E:/Bagel-Multiverse-20B-bg
388
+ layer_range: [48,49]
389
+ parameters:
390
+ scale:
391
+ - filter: o_proj
392
+ value: 0.253
393
+ - filter: down_proj
394
+ value: 0.254
395
+ - value: .26
396
+ - sources:
397
+ - model: E:/Bagel-Multiverse-20B-bg
398
+ layer_range: [48,49]
399
+ parameters:
400
+ scale:
401
+ - filter: o_proj
402
+ value: 0.255
403
+ - filter: down_proj
404
+ value: 0.255
405
+ - value: .28
406
+ - sources:
407
+ - model: E:/Bagel-Multiverse-20B-bg
408
+ layer_range: [48,49]
409
+ parameters:
410
+ scale:
411
+ - filter: o_proj
412
+ value: 0.256
413
+ - filter: down_proj
414
+ value: 0.256
415
+ - value: .30
416
+
417
+ # O PROJ, DPROJ to .3333 /
418
+ # end game
419
+
420
+ - sources:
421
+ - model: E:/Bagel-Multiverse-20B-bg
422
+ layer_range: [48,49]
423
+ parameters:
424
+ scale:
425
+ - filter: o_proj
426
+ value: 0.3333333333333
427
+ - filter: down_proj
428
+ value: 0.3333333333333
429
+ - value: 0.3333333333333
430
+ - sources:
431
+ - model: E:/Bagel-Multiverse-20B-bg
432
+ layer_range: [48,49]
433
+ parameters:
434
+ scale:
435
+ - filter: o_proj
436
+ value: 0.4444444444444
437
+ - filter: down_proj
438
+ value: 0.4444444444444
439
+ - value: 0.4444444444444
440
+ - sources:
441
+ - model: E:/Bagel-Multiverse-20B-bg
442
+ layer_range: [48,49]
443
+ parameters:
444
+ scale:
445
+ - filter: o_proj
446
+ value: 0.5555555555555
447
+ - filter: down_proj
448
+ value: 0.5555555555555
449
+ - value: 0.5555555555555
450
+ - sources:
451
+ - model: E:/Bagel-Multiverse-20B-bg
452
+ layer_range: [48,49]
453
+ parameters:
454
+ scale:
455
+ - filter: o_proj
456
+ value: 0.6666666666666
457
+ - filter: down_proj
458
+ value: 0.6666666666666
459
+ - value: 0.6666666666666
460
+ - sources:
461
+ - model: E:/Bagel-Multiverse-20B-bg
462
+ layer_range: [48,49]
463
+ parameters:
464
+ scale:
465
+ - filter: o_proj
466
+ value: 0.777777777777
467
+ - filter: down_proj
468
+ value: 0.777777777777
469
+ - value: 0.888888888888
470
+ merge_method: passthrough
471
+ dtype: float16
472
+ ```
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "E:/Bagel-Multiverse-20B-bg",
3
+ "add_gates": false,
4
+ "architectures": [
5
+ "MistralForCausalLM"
6
+ ],
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 14336,
15
+ "max_position_embeddings": 32768,
16
+ "model_type": "mistral",
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 88,
19
+ "num_key_value_heads": 8,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_theta": 10000.0,
22
+ "sliding_window": 4096,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "float16",
25
+ "transformers_version": "4.43.3",
26
+ "use_cache": true,
27
+ "vocab_size": 32000
28
+ }
mergekit_config.yml ADDED
@@ -0,0 +1,444 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Six splits plus "end game
2
+ # "D" starts at plus .1 VS D/O proj.
3
+ # 40 plus.
4
+ # Formula 3
5
+
6
+ slices:
7
+ - sources:
8
+ - model: E:/Bagel-Multiverse-20B-bg
9
+ layer_range: [0, 48]
10
+
11
+ # D/G Block Settings.
12
+ # split 1: .11 to .61 [.04 G] [4 layers]
13
+ # split 2: .15 to .41 [.08 G] [4 layers]
14
+ # split 3: .19 to .35 [.23 G] [12 layers]
15
+ # split 4: .11 to .41 [.244 G] [4 layers]
16
+ # split 5: .15 to .3 [.248 G] [4 layers]
17
+ # spilt 6: .19 to .3 [.256 G] [7 layers]
18
+ # final [D/G]: .33,.44,.55,.66,.77/.88 [5 layers]
19
+
20
+ # conc layers
21
+ # split 1
22
+
23
+ - sources:
24
+ - model: E:/Bagel-Multiverse-20B-bg
25
+ layer_range: [48,49]
26
+ parameters:
27
+ scale:
28
+ - filter: o_proj
29
+ value: 0.01
30
+ - filter: down_proj
31
+ value: 0.01
32
+ - value: 0.11
33
+ - sources:
34
+ - model: E:/Bagel-Multiverse-20B-bg
35
+ layer_range: [48,49]
36
+ parameters:
37
+ scale:
38
+ - filter: o_proj
39
+ value: 0.02
40
+ - filter: down_proj
41
+ value: 0.02
42
+ - value: 0.12
43
+ - sources:
44
+ - model: E:/Bagel-Multiverse-20B-bg
45
+ layer_range: [48,49]
46
+ parameters:
47
+ scale:
48
+ - filter: o_proj
49
+ value: 0.03
50
+ - filter: down_proj
51
+ value: 0.03
52
+ - value: 0.13
53
+
54
+ - sources:
55
+ - model: E:/Bagel-Multiverse-20B-bg
56
+ layer_range: [48,49]
57
+ parameters:
58
+ scale:
59
+ - filter: o_proj
60
+ value: 0.04
61
+ - filter: down_proj
62
+ value: 0.04
63
+ - value: 0.61
64
+
65
+ # split 2, SURGE D THEN D drop .46, continues @ D .15 (from .13)
66
+
67
+ - sources:
68
+ - model: E:/Bagel-Multiverse-20B-bg
69
+ layer_range: [48,49]
70
+ parameters:
71
+ scale:
72
+ - filter: o_proj
73
+ value: 0.05
74
+ - filter: down_proj
75
+ value: 0.05
76
+ - value: 0.15
77
+ - sources:
78
+ - model: E:/Bagel-Multiverse-20B-bg
79
+ layer_range: [48,49]
80
+ parameters:
81
+ scale:
82
+ - filter: o_proj
83
+ value: 0.06
84
+ - filter: down_proj
85
+ value: 0.06
86
+ - value: 0.16
87
+ - sources:
88
+ - model: E:/Bagel-Multiverse-20B-bg
89
+ layer_range: [48,49]
90
+ parameters:
91
+ scale:
92
+ - filter: o_proj
93
+ value: 0.07
94
+ - filter: down_proj
95
+ value: 0.07
96
+ - value: 0.17
97
+ - sources:
98
+ - model: E:/Bagel-Multiverse-20B-bg
99
+ layer_range: [48,49]
100
+ parameters:
101
+ scale:
102
+ - filter: o_proj
103
+ value: 0.08
104
+ - filter: down_proj
105
+ value: 0.08
106
+ - value: 0.41
107
+
108
+ # split 3, SURGE D to .41, D drop .21 ... follows .17 previous
109
+
110
+ - sources:
111
+ - model: E:/Bagel-Multiverse-20B-bg
112
+ layer_range: [48,49]
113
+ parameters:
114
+ scale:
115
+ - filter: o_proj
116
+ value: 0.09
117
+ - filter: down_proj
118
+ value: 0.09
119
+ - value: 0.19
120
+ - sources:
121
+ - model: E:/Bagel-Multiverse-20B-bg
122
+ layer_range: [48,49]
123
+ parameters:
124
+ scale:
125
+ - filter: o_proj
126
+ value: 0.10
127
+ - filter: down_proj
128
+ value: 0.10
129
+ - value: 0.20
130
+ - sources:
131
+ - model: E:/Bagel-Multiverse-20B-bg
132
+ layer_range: [48,49]
133
+ parameters:
134
+ scale:
135
+ - filter: o_proj
136
+ value: 0.11
137
+ - filter: down_proj
138
+ value: 0.11
139
+ - value: .22
140
+ - sources:
141
+ - model: E:/Bagel-Multiverse-20B-bg
142
+ layer_range: [48,49]
143
+ parameters:
144
+ scale:
145
+ - filter: o_proj
146
+ value: 0.12
147
+ - filter: down_proj
148
+ value: 0.12
149
+ - value: .24
150
+ - sources:
151
+ - model: E:/Bagel-Multiverse-20B-bg
152
+ layer_range: [48,49]
153
+ parameters:
154
+ scale:
155
+ - filter: o_proj
156
+ value: 0.13
157
+ - filter: down_proj
158
+ value: 0.13
159
+ - value: .26
160
+ - sources:
161
+ - model: E:/Bagel-Multiverse-20B-bg
162
+ layer_range: [48,49]
163
+ parameters:
164
+ scale:
165
+ - filter: o_proj
166
+ value: 0.14
167
+ - filter: down_proj
168
+ value: 0.14
169
+ - value: .28
170
+ - sources:
171
+ - model: E:/Bagel-Multiverse-20B-bg
172
+ layer_range: [48,49]
173
+ parameters:
174
+ scale:
175
+ - filter: o_proj
176
+ value: 0.15
177
+ - filter: down_proj
178
+ value: 0.15
179
+ - value: .30
180
+ - sources:
181
+ - model: E:/Bagel-Multiverse-20B-bg
182
+ layer_range: [48,49]
183
+ parameters:
184
+ scale:
185
+ - filter: o_proj
186
+ value: 0.16
187
+ - filter: down_proj
188
+ value: 0.16
189
+ - value: .31
190
+ - sources:
191
+ - model: E:/Bagel-Multiverse-20B-bg
192
+ layer_range: [48,49]
193
+ parameters:
194
+ scale:
195
+ - filter: o_proj
196
+ value: 0.20
197
+ - filter: down_proj
198
+ value: 0.20
199
+ - value: .32
200
+ - sources:
201
+ - model: E:/Bagel-Multiverse-20B-bg
202
+ layer_range: [48,49]
203
+ parameters:
204
+ scale:
205
+ - filter: o_proj
206
+ value: 0.21
207
+ - filter: down_proj
208
+ value: 0.21
209
+ - value: .33
210
+ - sources:
211
+ - model: E:/Bagel-Multiverse-20B-bg
212
+ layer_range: [48,49]
213
+ parameters:
214
+ scale:
215
+ - filter: o_proj
216
+ value: 0.22
217
+ - filter: down_proj
218
+ value: 0.22
219
+ - value: .34
220
+ - sources:
221
+ - model: E:/Bagel-Multiverse-20B-bg
222
+ layer_range: [48,49]
223
+ parameters:
224
+ scale:
225
+ - filter: o_proj
226
+ value: 0.23
227
+ - filter: down_proj
228
+ value: 0.23
229
+ - value: .35
230
+
231
+ # split 4 , NO SURGE D, "D" down drop of .24 ; reverts to .11 (the very first "D" setting )
232
+
233
+ - sources:
234
+ - model: E:/Bagel-Multiverse-20B-bg
235
+ layer_range: [48,49]
236
+ parameters:
237
+ scale:
238
+ - filter: o_proj
239
+ value: 0.24
240
+ - filter: down_proj
241
+ value: 0.24
242
+ - value: 0.11
243
+ - sources:
244
+ - model: E:/Bagel-Multiverse-20B-bg
245
+ layer_range: [48,49]
246
+ parameters:
247
+ scale:
248
+ - filter: o_proj
249
+ value: 0.241
250
+ - filter: down_proj
251
+ value: 0.241
252
+ - value: 0.12
253
+ - sources:
254
+ - model: E:/Bagel-Multiverse-20B-bg
255
+ layer_range: [48,49]
256
+ parameters:
257
+ scale:
258
+ - filter: o_proj
259
+ value: 0.242
260
+ - filter: down_proj
261
+ value: 0.243
262
+ - value: 0.13
263
+ - sources:
264
+ - model: E:/Bagel-Multiverse-20B-bg
265
+ layer_range: [48,49]
266
+ parameters:
267
+ scale:
268
+ - filter: o_proj
269
+ value: 0.244
270
+ - filter: down_proj
271
+ value: 0.244
272
+ - value: 0.61
273
+
274
+ # split 5, D Surge to .61, drop to .15 (following .13)
275
+
276
+ - sources:
277
+ - model: E:/Bagel-Multiverse-20B-bg
278
+ layer_range: [48,49]
279
+ parameters:
280
+ scale:
281
+ - filter: o_proj
282
+ value: 0.245
283
+ - filter: down_proj
284
+ value: 0.245
285
+ - value: 0.15
286
+ - sources:
287
+ - model: E:/Bagel-Multiverse-20B-bg
288
+ layer_range: [48,49]
289
+ parameters:
290
+ scale:
291
+ - filter: o_proj
292
+ value: 0.246
293
+ - filter: down_proj
294
+ value: 0.246
295
+ - value: 0.16
296
+ - sources:
297
+ - model: E:/Bagel-Multiverse-20B-bg
298
+ layer_range: [48,49]
299
+ parameters:
300
+ scale:
301
+ - filter: o_proj
302
+ value: 0.247
303
+ - filter: down_proj
304
+ value: 0.247
305
+ - value: 0.17
306
+ - sources:
307
+ - model: E:/Bagel-Multiverse-20B-bg
308
+ layer_range: [48,49]
309
+ parameters:
310
+ scale:
311
+ - filter: o_proj
312
+ value: 0.248
313
+ - filter: down_proj
314
+ value: 0.248
315
+ - value: 0.41
316
+
317
+ # split 6, D surge to .41 , then follows .17
318
+
319
+ - sources:
320
+ - model: E:/Bagel-Multiverse-20B-bg
321
+ layer_range: [48,49]
322
+ parameters:
323
+ scale:
324
+ - filter: o_proj
325
+ value: 0.249
326
+ - filter: down_proj
327
+ value: 0.249
328
+ - value: 0.19
329
+ - sources:
330
+ - model: E:/Bagel-Multiverse-20B-bg
331
+ layer_range: [48,49]
332
+ parameters:
333
+ scale:
334
+ - filter: o_proj
335
+ value: 0.250
336
+ - filter: down_proj
337
+ value: 0.250
338
+ - value: 0.20
339
+ - sources:
340
+ - model: E:/Bagel-Multiverse-20B-bg
341
+ layer_range: [48,49]
342
+ parameters:
343
+ scale:
344
+ - filter: o_proj
345
+ value: 0.251
346
+ - filter: down_proj
347
+ value: 0.251
348
+ - value: .22
349
+ - sources:
350
+ - model: E:/Bagel-Multiverse-20B-bg
351
+ layer_range: [48,49]
352
+ parameters:
353
+ scale:
354
+ - filter: o_proj
355
+ value: 0.252
356
+ - filter: down_proj
357
+ value: 0.252
358
+ - value: .24
359
+ - sources:
360
+ - model: E:/Bagel-Multiverse-20B-bg
361
+ layer_range: [48,49]
362
+ parameters:
363
+ scale:
364
+ - filter: o_proj
365
+ value: 0.253
366
+ - filter: down_proj
367
+ value: 0.254
368
+ - value: .26
369
+ - sources:
370
+ - model: E:/Bagel-Multiverse-20B-bg
371
+ layer_range: [48,49]
372
+ parameters:
373
+ scale:
374
+ - filter: o_proj
375
+ value: 0.255
376
+ - filter: down_proj
377
+ value: 0.255
378
+ - value: .28
379
+ - sources:
380
+ - model: E:/Bagel-Multiverse-20B-bg
381
+ layer_range: [48,49]
382
+ parameters:
383
+ scale:
384
+ - filter: o_proj
385
+ value: 0.256
386
+ - filter: down_proj
387
+ value: 0.256
388
+ - value: .30
389
+
390
+ # O PROJ, DPROJ to .3333 /
391
+ # end game
392
+
393
+ - sources:
394
+ - model: E:/Bagel-Multiverse-20B-bg
395
+ layer_range: [48,49]
396
+ parameters:
397
+ scale:
398
+ - filter: o_proj
399
+ value: 0.3333333333333
400
+ - filter: down_proj
401
+ value: 0.3333333333333
402
+ - value: 0.3333333333333
403
+ - sources:
404
+ - model: E:/Bagel-Multiverse-20B-bg
405
+ layer_range: [48,49]
406
+ parameters:
407
+ scale:
408
+ - filter: o_proj
409
+ value: 0.4444444444444
410
+ - filter: down_proj
411
+ value: 0.4444444444444
412
+ - value: 0.4444444444444
413
+ - sources:
414
+ - model: E:/Bagel-Multiverse-20B-bg
415
+ layer_range: [48,49]
416
+ parameters:
417
+ scale:
418
+ - filter: o_proj
419
+ value: 0.5555555555555
420
+ - filter: down_proj
421
+ value: 0.5555555555555
422
+ - value: 0.5555555555555
423
+ - sources:
424
+ - model: E:/Bagel-Multiverse-20B-bg
425
+ layer_range: [48,49]
426
+ parameters:
427
+ scale:
428
+ - filter: o_proj
429
+ value: 0.6666666666666
430
+ - filter: down_proj
431
+ value: 0.6666666666666
432
+ - value: 0.6666666666666
433
+ - sources:
434
+ - model: E:/Bagel-Multiverse-20B-bg
435
+ layer_range: [48,49]
436
+ parameters:
437
+ scale:
438
+ - filter: o_proj
439
+ value: 0.777777777777
440
+ - filter: down_proj
441
+ value: 0.777777777777
442
+ - value: 0.888888888888
443
+ merge_method: passthrough
444
+ dtype: float16
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.0.4.4", "total_size": 38912008192}, "weight_map": {"lm_head.weight": "model-00001-of-00041.safetensors", "model.embed_tokens.weight": "model-00001-of-00041.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00041.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00041.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00041.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00041.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00041.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00041.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00041.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00041.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00041.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00041.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00002-of-00041.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00002-of-00041.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00002-of-00041.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00002-of-00041.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00002-of-00041.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00002-of-00041.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00041.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.input_layernorm.weight": "model-00002-of-00041.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00041.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00041.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00041.safetensors", "model.layers.11.input_layernorm.weight": "model-00002-of-00041.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00002-of-00041.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00041.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00003-of-00041.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00041.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00041.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00041.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00041.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.input_layernorm.weight": "model-00003-of-00041.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00041.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00041.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00041.safetensors", "model.layers.13.input_layernorm.weight": "model-00003-of-00041.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00003-of-00041.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00041.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00004-of-00041.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00041.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00041.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00041.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00041.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.input_layernorm.weight": "model-00004-of-00041.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00041.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00041.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00041.safetensors", "model.layers.15.input_layernorm.weight": "model-00004-of-00041.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00004-of-00041.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00041.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00004-of-00041.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00041.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00041.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00005-of-00041.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00005-of-00041.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.input_layernorm.weight": "model-00005-of-00041.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00041.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00005-of-00041.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.input_layernorm.weight": "model-00005-of-00041.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00041.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00041.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00041.safetensors", "model.layers.18.input_layernorm.weight": "model-00005-of-00041.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00006-of-00041.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00006-of-00041.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00006-of-00041.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00006-of-00041.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00006-of-00041.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00006-of-00041.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00006-of-00041.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.input_layernorm.weight": "model-00006-of-00041.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00006-of-00041.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00006-of-00041.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00006-of-00041.safetensors", "model.layers.2.input_layernorm.weight": "model-00006-of-00041.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00006-of-00041.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00007-of-00041.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00007-of-00041.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00007-of-00041.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00007-of-00041.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00007-of-00041.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00007-of-00041.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.input_layernorm.weight": "model-00007-of-00041.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00007-of-00041.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00007-of-00041.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00007-of-00041.safetensors", "model.layers.21.input_layernorm.weight": "model-00007-of-00041.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00007-of-00041.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00007-of-00041.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00008-of-00041.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00008-of-00041.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00008-of-00041.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00008-of-00041.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00008-of-00041.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.input_layernorm.weight": "model-00008-of-00041.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00008-of-00041.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00008-of-00041.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00008-of-00041.safetensors", "model.layers.23.input_layernorm.weight": "model-00008-of-00041.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00008-of-00041.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00008-of-00041.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00008-of-00041.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00008-of-00041.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00008-of-00041.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00009-of-00041.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00009-of-00041.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.input_layernorm.weight": "model-00009-of-00041.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00009-of-00041.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00009-of-00041.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.input_layernorm.weight": "model-00009-of-00041.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00009-of-00041.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00009-of-00041.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00009-of-00041.safetensors", "model.layers.26.input_layernorm.weight": "model-00009-of-00041.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00010-of-00041.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00010-of-00041.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00010-of-00041.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00010-of-00041.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00010-of-00041.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00010-of-00041.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00010-of-00041.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.input_layernorm.weight": "model-00010-of-00041.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00010-of-00041.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00010-of-00041.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00010-of-00041.safetensors", "model.layers.28.input_layernorm.weight": "model-00010-of-00041.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00010-of-00041.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00011-of-00041.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00011-of-00041.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00011-of-00041.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00011-of-00041.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00011-of-00041.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00011-of-00041.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.input_layernorm.weight": "model-00011-of-00041.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00011-of-00041.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00011-of-00041.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00011-of-00041.safetensors", "model.layers.3.input_layernorm.weight": "model-00011-of-00041.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00011-of-00041.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00011-of-00041.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00012-of-00041.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00012-of-00041.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00012-of-00041.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00012-of-00041.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00012-of-00041.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.input_layernorm.weight": "model-00012-of-00041.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00012-of-00041.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00012-of-00041.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00012-of-00041.safetensors", "model.layers.31.input_layernorm.weight": "model-00012-of-00041.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00012-of-00041.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00012-of-00041.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00012-of-00041.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00012-of-00041.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00012-of-00041.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00013-of-00041.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00013-of-00041.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.input_layernorm.weight": "model-00013-of-00041.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00013-of-00041.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00013-of-00041.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.input_layernorm.weight": "model-00013-of-00041.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00013-of-00041.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00013-of-00041.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00013-of-00041.safetensors", "model.layers.34.input_layernorm.weight": "model-00013-of-00041.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00014-of-00041.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00014-of-00041.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00014-of-00041.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00014-of-00041.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00014-of-00041.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00014-of-00041.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00014-of-00041.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.input_layernorm.weight": "model-00014-of-00041.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00014-of-00041.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00014-of-00041.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00014-of-00041.safetensors", "model.layers.36.input_layernorm.weight": "model-00014-of-00041.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00014-of-00041.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00015-of-00041.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00015-of-00041.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00015-of-00041.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00015-of-00041.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00015-of-00041.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00015-of-00041.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.input_layernorm.weight": "model-00015-of-00041.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00015-of-00041.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00015-of-00041.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00015-of-00041.safetensors", "model.layers.38.input_layernorm.weight": "model-00015-of-00041.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00015-of-00041.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00015-of-00041.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00016-of-00041.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00016-of-00041.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00016-of-00041.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00016-of-00041.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00016-of-00041.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.input_layernorm.weight": "model-00016-of-00041.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00016-of-00041.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00016-of-00041.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00016-of-00041.safetensors", "model.layers.4.input_layernorm.weight": "model-00016-of-00041.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00016-of-00041.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00016-of-00041.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00016-of-00041.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00016-of-00041.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00016-of-00041.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00017-of-00041.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00017-of-00041.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.input_layernorm.weight": "model-00017-of-00041.safetensors", "model.layers.40.mlp.down_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.mlp.gate_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.mlp.up_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.post_attention_layernorm.weight": "model-00017-of-00041.safetensors", "model.layers.40.self_attn.k_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.self_attn.o_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.self_attn.q_proj.weight": "model-00017-of-00041.safetensors", "model.layers.40.self_attn.v_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.input_layernorm.weight": "model-00017-of-00041.safetensors", "model.layers.41.mlp.down_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.mlp.gate_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.mlp.up_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.post_attention_layernorm.weight": "model-00017-of-00041.safetensors", "model.layers.41.self_attn.k_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.self_attn.o_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.self_attn.q_proj.weight": "model-00017-of-00041.safetensors", "model.layers.41.self_attn.v_proj.weight": "model-00017-of-00041.safetensors", "model.layers.42.input_layernorm.weight": "model-00017-of-00041.safetensors", "model.layers.42.mlp.down_proj.weight": "model-00018-of-00041.safetensors", "model.layers.42.mlp.gate_proj.weight": "model-00018-of-00041.safetensors", "model.layers.42.mlp.up_proj.weight": "model-00018-of-00041.safetensors", "model.layers.42.post_attention_layernorm.weight": "model-00018-of-00041.safetensors", "model.layers.42.self_attn.k_proj.weight": "model-00018-of-00041.safetensors", "model.layers.42.self_attn.o_proj.weight": "model-00018-of-00041.safetensors", "model.layers.42.self_attn.q_proj.weight": "model-00018-of-00041.safetensors", "model.layers.42.self_attn.v_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.input_layernorm.weight": "model-00018-of-00041.safetensors", "model.layers.43.mlp.down_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.mlp.gate_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.mlp.up_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.post_attention_layernorm.weight": "model-00018-of-00041.safetensors", "model.layers.43.self_attn.k_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.self_attn.o_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.self_attn.q_proj.weight": "model-00018-of-00041.safetensors", "model.layers.43.self_attn.v_proj.weight": "model-00018-of-00041.safetensors", "model.layers.44.input_layernorm.weight": "model-00018-of-00041.safetensors", "model.layers.44.mlp.down_proj.weight": "model-00018-of-00041.safetensors", "model.layers.44.mlp.gate_proj.weight": "model-00019-of-00041.safetensors", "model.layers.44.mlp.up_proj.weight": "model-00019-of-00041.safetensors", "model.layers.44.post_attention_layernorm.weight": "model-00019-of-00041.safetensors", "model.layers.44.self_attn.k_proj.weight": "model-00019-of-00041.safetensors", "model.layers.44.self_attn.o_proj.weight": "model-00019-of-00041.safetensors", "model.layers.44.self_attn.q_proj.weight": "model-00019-of-00041.safetensors", "model.layers.44.self_attn.v_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.input_layernorm.weight": "model-00019-of-00041.safetensors", "model.layers.45.mlp.down_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.mlp.gate_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.mlp.up_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.post_attention_layernorm.weight": "model-00019-of-00041.safetensors", "model.layers.45.self_attn.k_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.self_attn.o_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.self_attn.q_proj.weight": "model-00019-of-00041.safetensors", "model.layers.45.self_attn.v_proj.weight": "model-00019-of-00041.safetensors", "model.layers.46.input_layernorm.weight": "model-00019-of-00041.safetensors", "model.layers.46.mlp.down_proj.weight": "model-00019-of-00041.safetensors", "model.layers.46.mlp.gate_proj.weight": "model-00019-of-00041.safetensors", "model.layers.46.mlp.up_proj.weight": "model-00020-of-00041.safetensors", "model.layers.46.post_attention_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.46.self_attn.k_proj.weight": "model-00020-of-00041.safetensors", "model.layers.46.self_attn.o_proj.weight": "model-00020-of-00041.safetensors", "model.layers.46.self_attn.q_proj.weight": "model-00020-of-00041.safetensors", "model.layers.46.self_attn.v_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.47.mlp.down_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.mlp.gate_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.mlp.up_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.post_attention_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.47.self_attn.k_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.self_attn.o_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.self_attn.q_proj.weight": "model-00020-of-00041.safetensors", "model.layers.47.self_attn.v_proj.weight": "model-00020-of-00041.safetensors", "model.layers.87.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.86.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.85.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.84.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.83.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.82.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.81.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.80.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.79.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.78.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.77.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.76.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.75.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.74.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.73.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.72.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.71.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.70.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.69.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.68.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.67.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.66.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.65.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.64.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.63.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.62.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.61.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.60.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.59.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.58.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.57.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.56.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.55.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.54.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.53.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.52.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.51.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.50.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.49.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.48.input_layernorm.weight": "model-00020-of-00041.safetensors", "model.layers.87.mlp.down_proj.weight": "model-00020-of-00041.safetensors", "model.layers.86.mlp.down_proj.weight": "model-00020-of-00041.safetensors", "model.layers.85.mlp.down_proj.weight": "model-00020-of-00041.safetensors", "model.layers.84.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.83.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.82.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.81.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.80.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.79.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.78.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.77.mlp.down_proj.weight": "model-00021-of-00041.safetensors", "model.layers.76.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.75.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.74.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.73.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.72.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.71.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.70.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.69.mlp.down_proj.weight": "model-00022-of-00041.safetensors", "model.layers.68.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.67.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.66.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.65.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.64.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.63.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.62.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.61.mlp.down_proj.weight": "model-00023-of-00041.safetensors", "model.layers.60.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.59.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.58.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.57.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.56.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.55.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.54.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.53.mlp.down_proj.weight": "model-00024-of-00041.safetensors", "model.layers.52.mlp.down_proj.weight": "model-00025-of-00041.safetensors", "model.layers.51.mlp.down_proj.weight": "model-00025-of-00041.safetensors", "model.layers.50.mlp.down_proj.weight": "model-00025-of-00041.safetensors", "model.layers.49.mlp.down_proj.weight": "model-00025-of-00041.safetensors", "model.layers.48.mlp.down_proj.weight": "model-00025-of-00041.safetensors", "model.layers.87.mlp.gate_proj.weight": "model-00025-of-00041.safetensors", "model.layers.86.mlp.gate_proj.weight": "model-00025-of-00041.safetensors", "model.layers.85.mlp.gate_proj.weight": "model-00025-of-00041.safetensors", "model.layers.84.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.83.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.82.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.81.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.80.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.79.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.78.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.77.mlp.gate_proj.weight": "model-00026-of-00041.safetensors", "model.layers.76.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.75.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.74.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.73.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.72.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.71.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.70.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.69.mlp.gate_proj.weight": "model-00027-of-00041.safetensors", "model.layers.68.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.67.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.66.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.65.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.64.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.63.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.62.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.61.mlp.gate_proj.weight": "model-00028-of-00041.safetensors", "model.layers.60.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.59.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.58.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.57.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.56.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.55.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.54.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.53.mlp.gate_proj.weight": "model-00029-of-00041.safetensors", "model.layers.52.mlp.gate_proj.weight": "model-00030-of-00041.safetensors", "model.layers.51.mlp.gate_proj.weight": "model-00030-of-00041.safetensors", "model.layers.50.mlp.gate_proj.weight": "model-00030-of-00041.safetensors", "model.layers.49.mlp.gate_proj.weight": "model-00030-of-00041.safetensors", "model.layers.48.mlp.gate_proj.weight": "model-00030-of-00041.safetensors", "model.layers.87.mlp.up_proj.weight": "model-00030-of-00041.safetensors", "model.layers.86.mlp.up_proj.weight": "model-00030-of-00041.safetensors", "model.layers.85.mlp.up_proj.weight": "model-00030-of-00041.safetensors", "model.layers.84.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.83.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.82.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.81.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.80.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.79.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.78.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.77.mlp.up_proj.weight": "model-00031-of-00041.safetensors", "model.layers.76.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.75.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.74.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.73.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.72.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.71.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.70.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.69.mlp.up_proj.weight": "model-00032-of-00041.safetensors", "model.layers.68.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.67.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.66.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.65.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.64.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.63.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.62.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.61.mlp.up_proj.weight": "model-00033-of-00041.safetensors", "model.layers.60.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.59.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.58.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.57.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.56.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.55.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.54.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.53.mlp.up_proj.weight": "model-00034-of-00041.safetensors", "model.layers.52.mlp.up_proj.weight": "model-00035-of-00041.safetensors", "model.layers.51.mlp.up_proj.weight": "model-00035-of-00041.safetensors", "model.layers.50.mlp.up_proj.weight": "model-00035-of-00041.safetensors", "model.layers.49.mlp.up_proj.weight": "model-00035-of-00041.safetensors", "model.layers.48.mlp.up_proj.weight": "model-00035-of-00041.safetensors", "model.layers.87.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.86.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.85.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.84.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.83.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.82.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.81.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.80.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.79.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.78.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.77.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.76.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.75.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.74.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.73.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.72.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.71.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.70.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.69.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.68.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.67.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.66.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.65.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.64.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.63.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.62.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.61.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.60.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.59.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.58.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.57.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.56.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.55.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.54.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.53.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.52.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.51.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.50.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.49.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.48.post_attention_layernorm.weight": "model-00035-of-00041.safetensors", "model.layers.87.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.86.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.85.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.84.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.83.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.82.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.81.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.80.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.79.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.78.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.77.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.76.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.75.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.74.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.73.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.72.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.71.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.70.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.69.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.68.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.67.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.66.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.65.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.64.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.63.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.62.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.61.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.60.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.59.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.58.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.57.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.56.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.55.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.54.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.53.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.52.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.51.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.50.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.49.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.48.self_attn.k_proj.weight": "model-00035-of-00041.safetensors", "model.layers.87.self_attn.o_proj.weight": "model-00035-of-00041.safetensors", "model.layers.86.self_attn.o_proj.weight": "model-00035-of-00041.safetensors", "model.layers.85.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.84.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.83.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.82.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.81.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.80.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.79.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.78.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.77.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.76.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.75.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.74.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.73.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.72.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.71.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.70.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.69.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.68.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.67.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.66.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.65.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.64.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.63.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.62.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.61.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.60.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.59.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.58.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.57.self_attn.o_proj.weight": "model-00036-of-00041.safetensors", "model.layers.56.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.55.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.54.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.53.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.52.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.51.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.50.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.49.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.48.self_attn.o_proj.weight": "model-00037-of-00041.safetensors", "model.layers.87.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.86.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.85.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.84.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.83.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.82.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.81.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.80.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.79.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.78.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.77.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.76.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.75.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.74.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.73.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.72.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.71.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.70.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.69.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.68.self_attn.q_proj.weight": "model-00037-of-00041.safetensors", "model.layers.67.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.66.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.65.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.64.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.63.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.62.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.61.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.60.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.59.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.58.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.57.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.56.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.55.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.54.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.53.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.52.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.51.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.50.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.49.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.48.self_attn.q_proj.weight": "model-00038-of-00041.safetensors", "model.layers.87.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.86.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.85.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.84.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.83.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.82.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.81.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.80.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.79.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.78.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.77.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.76.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.75.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.74.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.73.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.72.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.71.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.70.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.69.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.68.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.67.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.66.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.65.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.64.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.63.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.62.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.61.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.60.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.59.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.58.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.57.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.56.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.55.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.54.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.53.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.52.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.51.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.50.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.49.self_attn.v_proj.weight": "model-00038-of-00041.safetensors", "model.layers.48.self_attn.v_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.input_layernorm.weight": "model-00039-of-00041.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00039-of-00041.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00039-of-00041.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.input_layernorm.weight": "model-00039-of-00041.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00039-of-00041.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00039-of-00041.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00039-of-00041.safetensors", "model.layers.7.input_layernorm.weight": "model-00039-of-00041.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00039-of-00041.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00040-of-00041.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00040-of-00041.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00040-of-00041.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00040-of-00041.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00040-of-00041.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00040-of-00041.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.input_layernorm.weight": "model-00040-of-00041.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00040-of-00041.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00040-of-00041.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00040-of-00041.safetensors", "model.layers.9.input_layernorm.weight": "model-00040-of-00041.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00040-of-00041.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00040-of-00041.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00041-of-00041.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00041-of-00041.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00041-of-00041.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00041-of-00041.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00041-of-00041.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00041-of-00041.safetensors", "model.norm.weight": "model-00041-of-00041.safetensors"}}
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{{ '### Instruction:\n' + message['content'] + '\n### Response:\n' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% elif message['role'] == 'system' %}{{ '### System:\n' + message['content'] + '\n' }}{% endif %}{% endfor %}",
34
+ "clean_up_tokenization_spaces": false,
35
+ "eos_token": "</s>",
36
+ "legacy": true,
37
+ "model_max_length": 32768,
38
+ "pad_token": "<unk>",
39
+ "padding_side": "left",
40
+ "sp_model_kwargs": {},
41
+ "spaces_between_special_tokens": false,
42
+ "tokenizer_class": "LlamaTokenizer",
43
+ "unk_token": "<unk>",
44
+ "use_default_system_prompt": false
45
+ }