nicoboss commited on
Commit
9629b9e
1 Parent(s): 03484ae

initial upload

Browse files
README.md CHANGED
@@ -1,3 +1,485 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-Nemo-Base-2407
4
+ tags:
5
+ - generated_from_trainer
6
+ - axolotl
7
+ datasets:
8
+ - cognitivecomputations/Dolphin-2.9
9
+ - teknium/OpenHermes-2.5
10
+ - m-a-p/CodeFeedback-Filtered-Instruction
11
+ - cognitivecomputations/dolphin-coder
12
+ - cognitivecomputations/samantha-data
13
+ - microsoft/orca-math-word-problems-200k
14
+ - Locutusque/function-calling-chatml
15
+ - internlm/Agent-FLAN
16
+ ---
17
+
18
+ # Dolphin 2.9.3 Mistral Nemo 12b 🐬
19
+
20
+ Curated and trained by Eric Hartford and Cognitive Computations
21
+
22
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/h3K4XGj2RH)
23
+ Discord: https://discord.gg/h3K4XGj2RH
24
+
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
26
+
27
+ Our appreciation for the sponsors of Dolphin 2.9.3:
28
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xL40S node
29
+
30
+ This model is based on mistralai/Mistral-Nemo-Base-2407, and is governed by the apache 2.0 license.
31
+
32
+ The base model has 128K context, and our finetuning used 8192 sequence length.
33
+
34
+ Dolphin 2.9.3 uses ChatML prompt template format.
35
+
36
+ example:
37
+
38
+ ```
39
+ <|im_start|>system
40
+ You are Dolphin, a helpful AI assistant.<|im_end|>
41
+ <|im_start|>user
42
+ {prompt}<|im_end|>
43
+ <|im_start|>assistant
44
+
45
+ ```
46
+
47
+ Dolphin-2.9.3 has a variety of instruction following, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
48
+
49
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
50
+
51
+ Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
52
+
53
+ ## Evals
54
+
55
+ <details><summary>See evals</summary>
56
+
57
+ ```
58
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
59
+ |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
60
+ |leaderboard |N/A |none | 0|acc |↑ |0.3437|± |0.0043|
61
+ | | |none | 0|acc_norm |↑ |0.5076|± |0.0053|
62
+ | | |none | 0|exact_match |↑ |0.0536|± |0.0061|
63
+ | | |none | 0|inst_level_loose_acc |↑ |0.4388|± |N/A |
64
+ | | |none | 0|inst_level_strict_acc |↑ |0.3741|± |N/A |
65
+ | | |none | 0|prompt_level_loose_acc |↑ |0.3105|± |0.0199|
66
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2477|± |0.0186|
67
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.5549|± |0.0061|
68
+ | - leaderboard_bbh_boolean_expressions | 0|none | 3|acc_norm |↑ |0.8640|± |0.0217|
69
+ | - leaderboard_bbh_causal_judgement | 0|none | 3|acc_norm |↑ |0.6417|± |0.0352|
70
+ | - leaderboard_bbh_date_understanding | 0|none | 3|acc_norm |↑ |0.6080|± |0.0309|
71
+ | - leaderboard_bbh_disambiguation_qa | 0|none | 3|acc_norm |↑ |0.6480|± |0.0303|
72
+ | - leaderboard_bbh_formal_fallacies | 0|none | 3|acc_norm |↑ |0.5360|± |0.0316|
73
+ | - leaderboard_bbh_geometric_shapes | 0|none | 3|acc_norm |↑ |0.5240|± |0.0316|
74
+ | - leaderboard_bbh_hyperbaton | 0|none | 3|acc_norm |↑ |0.6440|± |0.0303|
75
+ | - leaderboard_bbh_logical_deduction_five_objects | 0|none | 3|acc_norm |↑ |0.4600|± |0.0316|
76
+ | - leaderboard_bbh_logical_deduction_seven_objects | 0|none | 3|acc_norm |↑ |0.4680|± |0.0316|
77
+ | - leaderboard_bbh_logical_deduction_three_objects | 0|none | 3|acc_norm |↑ |0.7000|± |0.0290|
78
+ | - leaderboard_bbh_movie_recommendation | 0|none | 3|acc_norm |↑ |0.8160|± |0.0246|
79
+ | - leaderboard_bbh_navigate | 0|none | 3|acc_norm |↑ |0.6040|± |0.0310|
80
+ | - leaderboard_bbh_object_counting | 0|none | 3|acc_norm |↑ |0.3680|± |0.0306|
81
+ | - leaderboard_bbh_penguins_in_a_table | 0|none | 3|acc_norm |↑ |0.5548|± |0.0413|
82
+ | - leaderboard_bbh_reasoning_about_colored_objects | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
83
+ | - leaderboard_bbh_ruin_names | 0|none | 3|acc_norm |↑ |0.7440|± |0.0277|
84
+ | - leaderboard_bbh_salient_translation_error_detection | 0|none | 3|acc_norm |↑ |0.5280|± |0.0316|
85
+ | - leaderboard_bbh_snarks | 0|none | 3|acc_norm |↑ |0.6292|± |0.0363|
86
+ | - leaderboard_bbh_sports_understanding | 0|none | 3|acc_norm |↑ |0.8040|± |0.0252|
87
+ | - leaderboard_bbh_temporal_sequences | 0|none | 3|acc_norm |↑ |0.4680|± |0.0316|
88
+ | - leaderboard_bbh_tracking_shuffled_objects_five_objects | 0|none | 3|acc_norm |↑ |0.2160|± |0.0261|
89
+ | - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 0|none | 3|acc_norm |↑ |0.1160|± |0.0203|
90
+ | - leaderboard_bbh_tracking_shuffled_objects_three_objects| 0|none | 3|acc_norm |↑ |0.3000|± |0.0290|
91
+ | - leaderboard_bbh_web_of_lies | 0|none | 3|acc_norm |↑ |0.4880|± |0.0317|
92
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.3146|± |0.0135|
93
+ | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.3182|± |0.0332|
94
+ | - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.3187|± |0.0200|
95
+ | - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.3080|± |0.0218|
96
+ | - leaderboard_ifeval | 2|none | 0|inst_level_loose_acc |↑ |0.4388|± |N/A |
97
+ | | |none | 0|inst_level_strict_acc |↑ |0.3741|± |N/A |
98
+ | | |none | 0|prompt_level_loose_acc |↑ |0.3105|± |0.0199|
99
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2477|± |0.0186|
100
+ | - leaderboard_math_algebra_hard | 1|none | 4|exact_match |↑ |0.0749|± |0.0150|
101
+ | - leaderboard_math_counting_and_prob_hard | 1|none | 4|exact_match |↑ |0.0244|± |0.0140|
102
+ | - leaderboard_math_geometry_hard | 1|none | 4|exact_match |↑ |0.0227|± |0.0130|
103
+ | - leaderboard_math_hard |N/A |none | 4|exact_match |↑ |0.0536|± |0.0061|
104
+ | - leaderboard_math_intermediate_algebra_hard | 1|none | 4|exact_match |↑ |0.0250|± |0.0093|
105
+ | - leaderboard_math_num_theory_hard | 1|none | 4|exact_match |↑ |0.0390|± |0.0156|
106
+ | - leaderboard_math_prealgebra_hard | 1|none | 4|exact_match |↑ |0.1295|± |0.0242|
107
+ | - leaderboard_math_precalculus_hard | 1|none | 4|exact_match |↑ |0.0296|± |0.0146|
108
+ | - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.3437|± |0.0043|
109
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.4511|± |0.0178|
110
+ | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.5880|± |0.0312|
111
+ | - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.3438|± |0.0297|
112
+ | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.4240|± |0.0313|
113
+
114
+ | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
115
+ |------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
116
+ |leaderboard |N/A |none | 0|acc |↑ |0.3437|± |0.0043|
117
+ | | |none | 0|acc_norm |↑ |0.5076|± |0.0053|
118
+ | | |none | 0|exact_match |↑ |0.0536|± |0.0061|
119
+ | | |none | 0|inst_level_loose_acc |↑ |0.4388|± |N/A |
120
+ | | |none | 0|inst_level_strict_acc |↑ |0.3741|± |N/A |
121
+ | | |none | 0|prompt_level_loose_acc |↑ |0.3105|± |0.0199|
122
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2477|± |0.0186|
123
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.5549|± |0.0061|
124
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.3146|± |0.0135|
125
+ | - leaderboard_math_hard|N/A |none | 4|exact_match |↑ |0.0536|± |0.0061|
126
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.4511|± |0.0178|
127
+ ```
128
+
129
+ </details><br>
130
+
131
+ ## Training
132
+
133
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
134
+ should probably proofread and complete it, then remove this comment. -->
135
+
136
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
137
+ <details><summary>See axolotl config</summary>
138
+
139
+ axolotl version: `0.4.1`
140
+ ```yaml
141
+ base_model: /workspace/models/Mistral-Nemo-Base-2407
142
+ model_type: AutoModelForCausalLM
143
+ tokenizer_type: AutoTokenizer
144
+
145
+ load_in_8bit: false
146
+ # load_in_4bit: true
147
+ strict: false
148
+
149
+ datasets:
150
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin201-sharegpt2.jsonl
151
+ type: sharegpt
152
+ conversation: chatml
153
+ - path: /workspace/datasets/dolphin-2.9.3/SystemChat_filtered_sharegpt.jsonl
154
+ type: sharegpt
155
+ conversation: chatml
156
+ - path: /workspace/datasets/dolphin-2.9.3/SystemChat_multilingual_sharegpt.jsonl
157
+ type: sharegpt
158
+ conversation: chatml
159
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-translate-sharegpt2.jsonl
160
+ type: sharegpt
161
+ conversation: chatml
162
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-codegen-sharegpt2.jsonl
163
+ type: sharegpt
164
+ conversation: chatml
165
+ - path: /workspace/datasets/dolphin-2.9.3/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
166
+ type: sharegpt
167
+ conversation: chatml
168
+ - path: /workspace/datasets/dolphin-2.9.3/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
169
+ type: sharegpt
170
+ conversation: chatml
171
+ - path: /workspace/datasets/dolphin-2.9.3/not_samantha_norefusals.jsonl
172
+ type: sharegpt
173
+ conversation: chatml
174
+ - path: /workspace/datasets/dolphin-2.9.3/Orca-Math-resort-unfiltered.jsonl
175
+ type: sharegpt
176
+ conversation: chatml
177
+ - path: /workspace/datasets/dolphin-2.9.3/agent_instruct_react_unfiltered.jsonl
178
+ type: sharegpt
179
+ conversation: chatml
180
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_instruct_j1s1_3k_unfiltered.jsonl
181
+ type: sharegpt
182
+ conversation: chatml
183
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_negative_unfiltered.jsonl
184
+ type: sharegpt
185
+ conversation: chatml
186
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_react_10p_unfiltered.jsonl
187
+ type: sharegpt
188
+ conversation: chatml
189
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_tflan_cot_30p_unfiltered.jsonl
190
+ type: sharegpt
191
+ conversation: chatml
192
+ - path: /workspace/datasets/dolphin-2.9.3/openhermes200k_unfiltered.jsonl
193
+ type: sharegpt
194
+ conversation: chatml
195
+
196
+ chat_template: chatml
197
+ # adapter: qlora
198
+ # lora_r: 128
199
+ # lora_alpha: 16
200
+ # lora_modules_to_save: [embed_tokens, lm_head]
201
+ # lora_dropout: 0.05
202
+ # lora_target_linear: true
203
+
204
+
205
+ unfrozen_parameters:
206
+ - ^lm_head.weight$
207
+ - ^model.embed_tokens.weight$
208
+ - input_layernorm
209
+ - model.norm
210
+ - post_attention_layernorm
211
+ - self_attn.rotary_emb
212
+ # mlp.down_proj layers
213
+ - model.layers.0.mlp.down_proj
214
+ - model.layers.1.mlp.down_proj
215
+ - model.layers.4.mlp.down_proj
216
+ - model.layers.37.mlp.down_proj
217
+ - model.layers.24.mlp.down_proj
218
+ - model.layers.2.mlp.down_proj
219
+ - model.layers.38.mlp.down_proj
220
+ - model.layers.35.mlp.down_proj
221
+ - model.layers.25.mlp.down_proj
222
+ - model.layers.6.mlp.down_proj
223
+ - model.layers.22.mlp.down_proj
224
+ - model.layers.23.mlp.down_proj
225
+ - model.layers.3.mlp.down_proj
226
+ - model.layers.21.mlp.down_proj
227
+ - model.layers.5.mlp.down_proj
228
+ - model.layers.28.mlp.down_proj
229
+ - model.layers.20.mlp.down_proj
230
+ - model.layers.26.mlp.down_proj
231
+ - model.layers.19.mlp.down_proj
232
+ - model.layers.34.mlp.down_proj
233
+ # mlp.gate_proj layers
234
+ - model.layers.2.mlp.gate_proj
235
+ - model.layers.1.mlp.gate_proj
236
+ - model.layers.3.mlp.gate_proj
237
+ - model.layers.5.mlp.gate_proj
238
+ - model.layers.4.mlp.gate_proj
239
+ - model.layers.35.mlp.gate_proj
240
+ - model.layers.36.mlp.gate_proj
241
+ - model.layers.37.mlp.gate_proj
242
+ - model.layers.38.mlp.gate_proj
243
+ - model.layers.34.mlp.gate_proj
244
+ - model.layers.33.mlp.gate_proj
245
+ - model.layers.8.mlp.gate_proj
246
+ - model.layers.32.mlp.gate_proj
247
+ - model.layers.6.mlp.gate_proj
248
+ - model.layers.28.mlp.gate_proj
249
+ - model.layers.26.mlp.gate_proj
250
+ - model.layers.30.mlp.gate_proj
251
+ - model.layers.23.mlp.gate_proj
252
+ - model.layers.29.mlp.gate_proj
253
+ - model.layers.27.mlp.gate_proj
254
+ # mlp.up_proj layers
255
+ - model.layers.3.mlp.up_proj
256
+ - model.layers.4.mlp.up_proj
257
+ - model.layers.6.mlp.up_proj
258
+ - model.layers.2.mlp.up_proj
259
+ - model.layers.5.mlp.up_proj
260
+ - model.layers.8.mlp.up_proj
261
+ - model.layers.10.mlp.up_proj
262
+ - model.layers.9.mlp.up_proj
263
+ - model.layers.7.mlp.up_proj
264
+ - model.layers.0.mlp.up_proj
265
+ - model.layers.17.mlp.up_proj
266
+ - model.layers.15.mlp.up_proj
267
+ - model.layers.22.mlp.up_proj
268
+ - model.layers.18.mlp.up_proj
269
+ - model.layers.16.mlp.up_proj
270
+ - model.layers.11.mlp.up_proj
271
+ - model.layers.21.mlp.up_proj
272
+ - model.layers.23.mlp.up_proj
273
+ - model.layers.20.mlp.up_proj
274
+ - model.layers.27.mlp.up_proj
275
+ # self_attn.k_proj layers
276
+ - model.layers.30.self_attn.k_proj
277
+ - model.layers.27.self_attn.k_proj
278
+ - model.layers.25.self_attn.k_proj
279
+ - model.layers.33.self_attn.k_proj
280
+ - model.layers.26.self_attn.k_proj
281
+ - model.layers.31.self_attn.k_proj
282
+ - model.layers.35.self_attn.k_proj
283
+ - model.layers.39.self_attn.k_proj
284
+ - model.layers.22.self_attn.k_proj
285
+ - model.layers.24.self_attn.k_proj
286
+ - model.layers.21.self_attn.k_proj
287
+ - model.layers.28.self_attn.k_proj
288
+ - model.layers.23.self_attn.k_proj
289
+ - model.layers.36.self_attn.k_proj
290
+ - model.layers.20.self_attn.k_proj
291
+ - model.layers.37.self_attn.k_proj
292
+ - model.layers.29.self_attn.k_proj
293
+ - model.layers.32.self_attn.k_proj
294
+ - model.layers.16.self_attn.k_proj
295
+ - model.layers.18.self_attn.k_proj
296
+ # self_attn.o_proj layers
297
+ - model.layers.7.self_attn.o_proj
298
+ - model.layers.6.self_attn.o_proj
299
+ - model.layers.9.self_attn.o_proj
300
+ - model.layers.5.self_attn.o_proj
301
+ - model.layers.27.self_attn.o_proj
302
+ - model.layers.26.self_attn.o_proj
303
+ - model.layers.4.self_attn.o_proj
304
+ - model.layers.31.self_attn.o_proj
305
+ - model.layers.8.self_attn.o_proj
306
+ - model.layers.16.self_attn.o_proj
307
+ - model.layers.3.self_attn.o_proj
308
+ - model.layers.10.self_attn.o_proj
309
+ - model.layers.18.self_attn.o_proj
310
+ - model.layers.33.self_attn.o_proj
311
+ - model.layers.17.self_attn.o_proj
312
+ - model.layers.32.self_attn.o_proj
313
+ - model.layers.30.self_attn.o_proj
314
+ - model.layers.2.self_attn.o_proj
315
+ - model.layers.15.self_attn.o_proj
316
+ - model.layers.11.self_attn.o_proj
317
+ # self_attn.q_proj layers
318
+ - model.layers.14.self_attn.q_proj
319
+ - model.layers.11.self_attn.q_proj
320
+ - model.layers.15.self_attn.q_proj
321
+ - model.layers.9.self_attn.q_proj
322
+ - model.layers.8.self_attn.q_proj
323
+ - model.layers.18.self_attn.q_proj
324
+ - model.layers.12.self_attn.q_proj
325
+ - model.layers.13.self_attn.q_proj
326
+ - model.layers.19.self_attn.q_proj
327
+ - model.layers.16.self_attn.q_proj
328
+ - model.layers.10.self_attn.q_proj
329
+ - model.layers.17.self_attn.q_proj
330
+ - model.layers.7.self_attn.q_proj
331
+ - model.layers.5.self_attn.q_proj
332
+ - model.layers.20.self_attn.q_proj
333
+ - model.layers.3.self_attn.q_proj
334
+ - model.layers.26.self_attn.q_proj
335
+ - model.layers.27.self_attn.q_proj
336
+ - model.layers.28.self_attn.q_proj
337
+ - model.layers.33.self_attn.q_proj
338
+ # self_attn.v_proj layers
339
+ - model.layers.27.self_attn.v_proj
340
+ - model.layers.20.self_attn.v_proj
341
+ - model.layers.24.self_attn.v_proj
342
+ - model.layers.25.self_attn.v_proj
343
+ - model.layers.30.self_attn.v_proj
344
+ - model.layers.2.self_attn.v_proj
345
+ - model.layers.23.self_attn.v_proj
346
+ - model.layers.22.self_attn.v_proj
347
+ - model.layers.26.self_attn.v_proj
348
+ - model.layers.33.self_attn.v_proj
349
+ - model.layers.37.self_attn.v_proj
350
+ - model.layers.7.self_attn.v_proj
351
+ - model.layers.4.self_attn.v_proj
352
+ - model.layers.18.self_attn.v_proj
353
+ - model.layers.31.self_attn.v_proj
354
+ - model.layers.17.self_attn.v_proj
355
+ - model.layers.35.self_attn.v_proj
356
+ - model.layers.32.self_attn.v_proj
357
+ - model.layers.21.self_attn.v_proj
358
+ - model.layers.3.self_attn.v_proj
359
+
360
+
361
+
362
+ dataset_prepared_path: /workspace/axolotl/dolph-2.9.3-nemo-prepared
363
+ val_set_size: 0.01
364
+ output_dir: /workspace/axolotl/dolphin-2.9.3-mistral-nemo
365
+
366
+ sequence_len: 8192
367
+ sample_packing: true
368
+ pad_to_sequence_len: true
369
+
370
+ wandb_project: dolphin-2.9.3-Mistral-nemo
371
+ wandb_watch:
372
+ wandb_run_id:
373
+ wandb_log_model:
374
+
375
+ gradient_accumulation_steps: 16
376
+ micro_batch_size: 1
377
+ num_epochs: 3
378
+ optimizer: adamw_torch
379
+ lr_scheduler: cosine
380
+ learning_rate: 5e-6
381
+ train_on_inputs: false
382
+ group_by_length: false
383
+ bf16: auto
384
+ fp16:
385
+ tf32:
386
+
387
+ gradient_checkpointing: true
388
+ gradient_checkpointing_kwargs:
389
+ use_reentrant: false
390
+ early_stopping_patience:
391
+ resume_from_checkpoint:
392
+ logging_steps: 1
393
+ xformers_attention:
394
+ flash_attention: true
395
+
396
+ warmup_steps: 100
397
+ # evals_per_epoch: 4
398
+ eval_table_size:
399
+ saves_per_epoch: 1
400
+ save_total_limit: 2
401
+ save_steps:
402
+ debug:
403
+ deepspeed: deepspeed_configs/zero3_bf16.json
404
+ weight_decay: 0.1
405
+ special_tokens:
406
+ eos_token: "<|im_end|>"
407
+ pad_token: "<pad>"
408
+ bos_token: "<s>"
409
+ unk_token: "<unk>"
410
+ tokens:
411
+ - "<|im_start|>"
412
+
413
+
414
+ # fsdp:
415
+ # - full_shard
416
+ # - auto_wrap
417
+ # fsdp_config:
418
+ # fsdp_limit_all_gathers: true
419
+ # fsdp_sync_module_states: true
420
+ # fsdp_offload_params: true
421
+ # fsdp_use_orig_params: false
422
+ # fsdp_cpu_ram_efficient_loading: true
423
+ # fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock
424
+ # fsdp_state_dict_type: FULL_STATE_DICT
425
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
426
+ # fsdp_sharding_strategy: FULL_SHARD
427
+ # fsdp_forward_prefetch: false
428
+ # fsdp_backward_prefetch: BACKWARD_PRE
429
+ ```
430
+
431
+ </details><br>
432
+
433
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/ehartford/dolphin-2.9.3-Mistral-nemo/runs/c23odyoj)
434
+ # workspace/axolotl/dolphin-2.9.3-mistral-nemo
435
+
436
+ This model was trained from scratch on the None dataset.
437
+ It achieves the following results on the evaluation set:
438
+ - Loss: 0.5605
439
+
440
+ ## Model description
441
+
442
+ More information needed
443
+
444
+ ## Intended uses & limitations
445
+
446
+ More information needed
447
+
448
+ ## Training and evaluation data
449
+
450
+ More information needed
451
+
452
+ ## Training procedure
453
+
454
+ ### Training hyperparameters
455
+
456
+ The following hyperparameters were used during training:
457
+ - learning_rate: 5e-06
458
+ - train_batch_size: 1
459
+ - eval_batch_size: 1
460
+ - seed: 42
461
+ - distributed_type: multi-GPU
462
+ - num_devices: 8
463
+ - gradient_accumulation_steps: 16
464
+ - total_train_batch_size: 128
465
+ - total_eval_batch_size: 8
466
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
467
+ - lr_scheduler_type: cosine
468
+ - lr_scheduler_warmup_steps: 100
469
+ - num_epochs: 3
470
+
471
+ ### Training results
472
+
473
+ | Training Loss | Epoch | Step | Validation Loss |
474
+ |:-------------:|:------:|:----:|:---------------:|
475
+ | 0.5691 | 1.0162 | 983 | 0.5734 |
476
+ | 0.5335 | 2.0174 | 1968 | 0.5609 |
477
+ | 0.5297 | 2.9639 | 2901 | 0.5605 |
478
+
479
+
480
+ ### Framework versions
481
+
482
+ - Transformers 4.43.0.dev0
483
+ - Pytorch 2.2.2+cu121
484
+ - Datasets 2.19.1
485
+ - Tokenizers 0.19.1
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "<|im_end|>": 131072,
3
+ "<|im_start|>": 131073
4
+ }
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistralai/Mistral-Nemo-Base-2407",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 131072,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "max_position_embeddings": 1024000,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 40,
18
+ "num_key_value_heads": 8,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.43.0.dev0",
25
+ "use_cache": false,
26
+ "vocab_size": 131074
27
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 2,
6
+ "transformers_version": "4.43.0.dev0"
7
+ }
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:813088808a75abb454357edfb9c4eeaa16f38cc8f5970da30bf3d264e8041a81
3
+ size 4865542976
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ed3353865c3d5f9b77556bdef5d105e29e39d65114ad8018f674c67fc1e412c
3
+ size 4907529424
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7603d5cfab122dcb68aa203574e20dd1eada2006f5a2f93c62464124cdd45c9b
3
+ size 4907529456
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bed8c12c2f060f3a04c2ed20f0feef454e3681b4d2c810a078450047a24e427
3
+ size 4907529456
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9b9230e29c247e9f75ddddc1318faa8c7c9a14e64f3d822dc6bdb33801d091e
3
+ size 4907516752
model.safetensors.index.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 24495605760
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00005-of-00005.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00005.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00005.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00003-of-00005.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00003-of-00005.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00003-of-00005.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00003-of-00005.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00003-of-00005.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00005.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00005.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00004-of-00005.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00004-of-00005.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00004-of-00005.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00004-of-00005.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00004-of-00005.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00004-of-00005.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00004-of-00005.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00004-of-00005.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00005-of-00005.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00005-of-00005.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00005-of-00005.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00005-of-00005.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00005-of-00005.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00005-of-00005.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00005-of-00005.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
323
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors",
324
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
325
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
326
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
327
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
328
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
329
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
330
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
331
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
332
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00005.safetensors",
333
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
334
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
335
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
336
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
337
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
338
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
339
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
340
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
341
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00005.safetensors",
342
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
343
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
344
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
345
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
346
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
347
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
348
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
349
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
350
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00005.safetensors",
351
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
352
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
353
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
354
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
355
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
356
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
357
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
358
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
359
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00005.safetensors",
360
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
361
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
362
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
363
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
364
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
365
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
366
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
367
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
368
+ "model.norm.weight": "model-00005-of-00005.safetensors"
369
+ }
370
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|im_end|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab.json ADDED
The diff for this file is too large to render. See raw diff