Text Generation
Transformers
Safetensors
qwen2
Generated from Trainer
axolotl
conversational
Inference Endpoints
text-generation-inference
Crystalcareai commited on
Commit
b9fd220
1 Parent(s): 1b89d96

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +431 -0
  2. added_tokens.json +5 -0
  3. config.json +27 -0
  4. generation_config.json +7 -0
  5. merges.txt +0 -0
  6. model-00001-of-00049.safetensors +3 -0
  7. model-00002-of-00049.safetensors +3 -0
  8. model-00003-of-00049.safetensors +3 -0
  9. model-00004-of-00049.safetensors +3 -0
  10. model-00005-of-00049.safetensors +3 -0
  11. model-00006-of-00049.safetensors +3 -0
  12. model-00007-of-00049.safetensors +3 -0
  13. model-00008-of-00049.safetensors +3 -0
  14. model-00009-of-00049.safetensors +3 -0
  15. model-00010-of-00049.safetensors +3 -0
  16. model-00011-of-00049.safetensors +3 -0
  17. model-00012-of-00049.safetensors +3 -0
  18. model-00013-of-00049.safetensors +3 -0
  19. model-00014-of-00049.safetensors +3 -0
  20. model-00015-of-00049.safetensors +3 -0
  21. model-00016-of-00049.safetensors +3 -0
  22. model-00017-of-00049.safetensors +3 -0
  23. model-00018-of-00049.safetensors +3 -0
  24. model-00019-of-00049.safetensors +3 -0
  25. model-00020-of-00049.safetensors +3 -0
  26. model-00021-of-00049.safetensors +3 -0
  27. model-00022-of-00049.safetensors +3 -0
  28. model-00023-of-00049.safetensors +3 -0
  29. model-00024-of-00049.safetensors +3 -0
  30. model-00025-of-00049.safetensors +3 -0
  31. model-00026-of-00049.safetensors +3 -0
  32. model-00027-of-00049.safetensors +3 -0
  33. model-00028-of-00049.safetensors +3 -0
  34. model-00029-of-00049.safetensors +3 -0
  35. model-00030-of-00049.safetensors +3 -0
  36. model-00031-of-00049.safetensors +3 -0
  37. model-00032-of-00049.safetensors +3 -0
  38. model-00033-of-00049.safetensors +3 -0
  39. model-00034-of-00049.safetensors +3 -0
  40. model-00035-of-00049.safetensors +3 -0
  41. model-00036-of-00049.safetensors +3 -0
  42. model-00037-of-00049.safetensors +3 -0
  43. model-00038-of-00049.safetensors +3 -0
  44. model-00039-of-00049.safetensors +3 -0
  45. model-00040-of-00049.safetensors +3 -0
  46. model-00041-of-00049.safetensors +3 -0
  47. model-00042-of-00049.safetensors +3 -0
  48. model-00043-of-00049.safetensors +3 -0
  49. model-00044-of-00049.safetensors +3 -0
  50. model-00045-of-00049.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,431 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ model-index:
5
+ - name: qwen-out
6
+ results: []
7
+ ---
8
+
9
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
+ should probably proofread and complete it, then remove this comment. -->
11
+
12
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
13
+ <details><summary>See axolotl config</summary>
14
+
15
+ axolotl version: `0.4.0`
16
+ ```yaml
17
+ base_model: /workspace/axolotl/qwen-checkpoint
18
+ model_type: AutoModelForCausalLM
19
+ tokenizer_type: AutoTokenizer
20
+
21
+ # trust_remote_code: true
22
+
23
+ # load_in_8bit: true
24
+ # load_in_4bit: true
25
+ # strict: false
26
+
27
+ datasets:
28
+ - path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
29
+ type: sharegpt
30
+ conversation: chatml
31
+ # - path: /workspace/datasets/dolphin-2.9/Ultrachat200kunfiltered.jsonl
32
+ # type: sharegpt
33
+ # conversation: chatml
34
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
35
+ type: sharegpt
36
+ conversation: chatml
37
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
38
+ type: sharegpt
39
+ conversation: chatml
40
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
41
+ type: sharegpt
42
+ conversation: chatml
43
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
44
+ type: sharegpt
45
+ conversation: chatml
46
+ - path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
47
+ type: sharegpt
48
+ conversation: chatml
49
+ - path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
50
+ type: sharegpt
51
+ conversation: chatml
52
+ - path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
53
+ type: sharegpt
54
+ conversation: chatml
55
+ - path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
56
+ type: sharegpt
57
+ conversation: chatml
58
+ - path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
59
+ type: sharegpt
60
+ conversation: chatml
61
+ - path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
62
+ type: sharegpt
63
+ conversation: chatml
64
+ - path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
65
+ type: sharegpt
66
+ conversation: chatml
67
+ - path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
68
+ type: sharegpt
69
+ conversation: chatml
70
+ # - path: /workspace/datasets/dolphin-2.9/SystemConversations.jsonl
71
+ # type: sharegpt
72
+ # conversation: chatml
73
+
74
+ chat_template: chatml
75
+ dataset_prepared_path: last_run_prepared
76
+ val_set_size: 0.01
77
+ output_dir: ./qwen-out
78
+
79
+ # adapter: qlora
80
+ # lora_r: 16
81
+ # lora_alpha: 16
82
+ # lora_modules_to_save: [embed_tokens, lm_head]
83
+ # lora_dropout: 0.05
84
+ # lora_target_linear: false
85
+
86
+ unfrozen_parameters:
87
+ - ^lm_head.weight$
88
+ - ^model.embed_tokens.weight$
89
+ # input_layernorm layers
90
+ - model.layers.0.input_layernorm
91
+ - model.layers.1.input_layernorm
92
+ - model.layers.2.input_layernorm
93
+ - model.layers.3.input_layernorm
94
+ - model.layers.4.input_layernorm
95
+ - model.layers.5.input_layernorm
96
+ - model.layers.6.input_layernorm
97
+ - model.layers.7.input_layernorm
98
+ - model.layers.8.input_layernorm
99
+ - model.layers.9.input_layernorm
100
+ - model.layers.10.input_layernorm
101
+ - model.layers.11.input_layernorm
102
+ - model.layers.12.input_layernorm
103
+ - model.layers.13.input_layernorm
104
+ - model.layers.14.input_layernorm
105
+ - model.layers.15.input_layernorm
106
+ - model.layers.16.input_layernorm
107
+ - model.layers.17.input_layernorm
108
+ - model.layers.18.input_layernorm
109
+ - model.layers.19.input_layernorm
110
+ - model.layers.20.input_layernorm
111
+ - model.layers.21.input_layernorm
112
+ - model.layers.22.input_layernorm
113
+ - model.layers.23.input_layernorm
114
+ # lm_head layers
115
+ # mlp.down_proj layers
116
+ - model.layers.17.mlp.down_proj
117
+ - model.layers.18.mlp.down_proj
118
+ - model.layers.19.mlp.down_proj
119
+ - model.layers.20.mlp.down_proj
120
+ - model.layers.21.mlp.down_proj
121
+ - model.layers.22.mlp.down_proj
122
+ - model.layers.23.mlp.down_proj
123
+ - model.layers.24.mlp.down_proj
124
+ - model.layers.25.mlp.down_proj
125
+ - model.layers.26.mlp.down_proj
126
+ - model.layers.27.mlp.down_proj
127
+ - model.layers.28.mlp.down_proj
128
+ - model.layers.29.mlp.down_proj
129
+ - model.layers.30.mlp.down_proj
130
+ - model.layers.31.mlp.down_proj
131
+ - model.layers.32.mlp.down_proj
132
+ - model.layers.33.mlp.down_proj
133
+ - model.layers.34.mlp.down_proj
134
+ - model.layers.35.mlp.down_proj
135
+ - model.layers.36.mlp.down_proj
136
+ - model.layers.37.mlp.down_proj
137
+ - model.layers.38.mlp.down_proj
138
+ - model.layers.39.mlp.down_proj
139
+ - model.layers.40.mlp.down_proj
140
+ # mlp.gate_proj layers
141
+ - model.layers.51.mlp.gate_proj
142
+ - model.layers.50.mlp.gate_proj
143
+ - model.layers.53.mlp.gate_proj
144
+ - model.layers.52.mlp.gate_proj
145
+ - model.layers.49.mlp.gate_proj
146
+ - model.layers.45.mlp.gate_proj
147
+ - model.layers.46.mlp.gate_proj
148
+ - model.layers.47.mlp.gate_proj
149
+ - model.layers.57.mlp.gate_proj
150
+ - model.layers.48.mlp.gate_proj
151
+ - model.layers.56.mlp.gate_proj
152
+ - model.layers.41.mlp.gate_proj
153
+ - model.layers.54.mlp.gate_proj
154
+ - model.layers.43.mlp.gate_proj
155
+ - model.layers.44.mlp.gate_proj
156
+ - model.layers.60.mlp.gate_proj
157
+ - model.layers.55.mlp.gate_proj
158
+ - model.layers.40.mlp.gate_proj
159
+ - model.layers.42.mlp.gate_proj
160
+ - model.layers.58.mlp.gate_proj
161
+ - model.layers.36.mlp.gate_proj
162
+ - model.layers.37.mlp.gate_proj
163
+ - model.layers.38.mlp.gate_proj
164
+ - model.layers.39.mlp.gate_proj
165
+ # mlp.up_proj layers
166
+ - model.layers.50.mlp.up_proj
167
+ - model.layers.51.mlp.up_proj
168
+ - model.layers.41.mlp.up_proj
169
+ - model.layers.49.mlp.up_proj
170
+ - model.layers.43.mlp.up_proj
171
+ - model.layers.44.mlp.up_proj
172
+ - model.layers.40.mlp.up_proj
173
+ - model.layers.45.mlp.up_proj
174
+ - model.layers.47.mlp.up_proj
175
+ - model.layers.48.mlp.up_proj
176
+ - model.layers.46.mlp.up_proj
177
+ - model.layers.42.mlp.up_proj
178
+ - model.layers.39.mlp.up_proj
179
+ - model.layers.36.mlp.up_proj
180
+ - model.layers.37.mlp.up_proj
181
+ - model.layers.38.mlp.up_proj
182
+ - model.layers.56.mlp.up_proj
183
+ - model.layers.57.mlp.up_proj
184
+ - model.layers.53.mlp.up_proj
185
+ - model.layers.31.mlp.up_proj
186
+ - model.layers.32.mlp.up_proj
187
+ - model.layers.34.mlp.up_proj
188
+ - model.layers.35.mlp.up_proj
189
+ - model.layers.33.mlp.up_proj
190
+ # model.embed_tokens layers
191
+ # model.norm layers
192
+ # post_attention_layernorm layers
193
+ - model.layers.0.post_attention_layernorm
194
+ - model.layers.1.post_attention_layernorm
195
+ - model.layers.2.post_attention_layernorm
196
+ - model.layers.3.post_attention_layernorm
197
+ - model.layers.4.post_attention_layernorm
198
+ - model.layers.5.post_attention_layernorm
199
+ - model.layers.6.post_attention_layernorm
200
+ - model.layers.7.post_attention_layernorm
201
+ - model.layers.8.post_attention_layernorm
202
+ - model.layers.9.post_attention_layernorm
203
+ - model.layers.10.post_attention_layernorm
204
+ - model.layers.11.post_attention_layernorm
205
+ - model.layers.12.post_attention_layernorm
206
+ - model.layers.13.post_attention_layernorm
207
+ - model.layers.14.post_attention_layernorm
208
+ - model.layers.15.post_attention_layernorm
209
+ - model.layers.16.post_attention_layernorm
210
+ - model.layers.17.post_attention_layernorm
211
+ - model.layers.18.post_attention_layernorm
212
+ - model.layers.19.post_attention_layernorm
213
+ - model.layers.20.post_attention_layernorm
214
+ - model.layers.21.post_attention_layernorm
215
+ - model.layers.22.post_attention_layernorm
216
+ - model.layers.23.post_attention_layernorm
217
+ # self_attn.k_proj layers
218
+ - model.layers.42.self_attn.k_proj
219
+ - model.layers.41.self_attn.k_proj
220
+ - model.layers.39.self_attn.k_proj
221
+ - model.layers.35.self_attn.k_proj
222
+ - model.layers.28.self_attn.k_proj
223
+ - model.layers.79.self_attn.k_proj
224
+ - model.layers.43.self_attn.k_proj
225
+ - model.layers.32.self_attn.k_proj
226
+ - model.layers.73.self_attn.k_proj
227
+ - model.layers.31.self_attn.k_proj
228
+ - model.layers.29.self_attn.k_proj
229
+ - model.layers.76.self_attn.k_proj
230
+ - model.layers.30.self_attn.k_proj
231
+ - model.layers.40.self_attn.k_proj
232
+ - model.layers.33.self_attn.k_proj
233
+ - model.layers.78.self_attn.k_proj
234
+ - model.layers.34.self_attn.k_proj
235
+ - model.layers.37.self_attn.k_proj
236
+ - model.layers.45.self_attn.k_proj
237
+ - model.layers.44.self_attn.k_proj
238
+ - model.layers.71.self_attn.k_proj
239
+ - model.layers.26.self_attn.k_proj
240
+ - model.layers.74.self_attn.k_proj
241
+ - model.layers.27.self_attn.k_proj
242
+ # self_attn.o_proj layers
243
+ - model.layers.35.self_attn.o_proj
244
+ - model.layers.34.self_attn.o_proj
245
+ - model.layers.37.self_attn.o_proj
246
+ - model.layers.33.self_attn.o_proj
247
+ - model.layers.31.self_attn.o_proj
248
+ - model.layers.27.self_attn.o_proj
249
+ - model.layers.38.self_attn.o_proj
250
+ - model.layers.24.self_attn.o_proj
251
+ - model.layers.39.self_attn.o_proj
252
+ - model.layers.43.self_attn.o_proj
253
+ - model.layers.29.self_attn.o_proj
254
+ - model.layers.0.self_attn.o_proj
255
+ - model.layers.50.self_attn.o_proj
256
+ - model.layers.32.self_attn.o_proj
257
+ - model.layers.45.self_attn.o_proj
258
+ - model.layers.30.self_attn.o_proj
259
+ - model.layers.60.self_attn.o_proj
260
+ - model.layers.23.self_attn.o_proj
261
+ - model.layers.18.self_attn.o_proj
262
+ - model.layers.67.self_attn.o_proj
263
+ - model.layers.57.self_attn.o_proj
264
+ - model.layers.20.self_attn.o_proj
265
+ - model.layers.76.self_attn.o_proj
266
+ - model.layers.28.self_attn.o_proj
267
+ # self_attn.q_proj layers
268
+ - model.layers.1.self_attn.q_proj
269
+ - model.layers.6.self_attn.q_proj
270
+ - model.layers.0.self_attn.q_proj
271
+ - model.layers.5.self_attn.q_proj
272
+ - model.layers.2.self_attn.q_proj
273
+ - model.layers.7.self_attn.q_proj
274
+ - model.layers.3.self_attn.q_proj
275
+ - model.layers.4.self_attn.q_proj
276
+ - model.layers.8.self_attn.q_proj
277
+ - model.layers.9.self_attn.q_proj
278
+ - model.layers.61.self_attn.q_proj
279
+ - model.layers.10.self_attn.q_proj
280
+ - model.layers.62.self_attn.q_proj
281
+ - model.layers.36.self_attn.q_proj
282
+ - model.layers.15.self_attn.q_proj
283
+ - model.layers.11.self_attn.q_proj
284
+ - model.layers.17.self_attn.q_proj
285
+ - model.layers.60.self_attn.q_proj
286
+ - model.layers.63.self_attn.q_proj
287
+ - model.layers.64.self_attn.q_proj
288
+ - model.layers.29.self_attn.q_proj
289
+ - model.layers.30.self_attn.q_proj
290
+ - model.layers.55.self_attn.q_proj
291
+ - model.layers.34.self_attn.q_proj
292
+ # self_attn.v_proj layers
293
+ - model.layers.12.self_attn.v_proj
294
+ - model.layers.16.self_attn.v_proj
295
+ - model.layers.18.self_attn.v_proj
296
+ - model.layers.19.self_attn.v_proj
297
+ - model.layers.20.self_attn.v_proj
298
+ - model.layers.21.self_attn.v_proj
299
+ - model.layers.22.self_attn.v_proj
300
+ - model.layers.23.self_attn.v_proj
301
+ - model.layers.24.self_attn.v_proj
302
+ - model.layers.25.self_attn.v_proj
303
+ - model.layers.26.self_attn.v_proj
304
+ - model.layers.27.self_attn.v_proj
305
+ - model.layers.28.self_attn.v_proj
306
+ - model.layers.29.self_attn.v_proj
307
+ - model.layers.30.self_attn.v_proj
308
+ - model.layers.31.self_attn.v_proj
309
+ - model.layers.32.self_attn.v_proj
310
+ - model.layers.33.self_attn.v_proj
311
+ - model.layers.34.self_attn.v_proj
312
+ - model.layers.35.self_attn.v_proj
313
+ - model.layers.36.self_attn.v_proj
314
+ - model.layers.37.self_attn.v_proj
315
+ - model.layers.38.self_attn.v_proj
316
+ - model.layers.39.self_attn.v_proj
317
+
318
+
319
+
320
+ sequence_len: 8192 # supports up to 8192
321
+ sample_packing: true
322
+ pad_to_sequence_len: true
323
+
324
+ # adapter: lora
325
+ # lora_model_dir:
326
+ # lora_r: 32
327
+ # lora_alpha: 16
328
+ # lora_dropout: 0.05
329
+ # lora_target_linear: true
330
+ # lora_fan_in_fan_out:
331
+
332
+ wandb_project: dolphin-2.9-qwen-1.5-110b
333
+ wandb_entity:
334
+ wandb_watch:
335
+ wandb_name:
336
+ wandb_log_model:
337
+
338
+ gradient_accumulation_steps: 8
339
+ micro_batch_size: 1
340
+ num_epochs: 1
341
+ optimizer: adamw_8bit
342
+ lr_scheduler: cosine
343
+ learning_rate: 1e-5
344
+
345
+ train_on_inputs: false
346
+ group_by_length: false
347
+ bf16: auto
348
+ fp16:
349
+ tf32: true
350
+
351
+ gradient_checkpointing: true
352
+ early_stopping_patience:
353
+ # resume_from_checkpoint: /workspace/axolotl/qwen-checkpoint
354
+ local_rank:
355
+ logging_steps: 1
356
+ xformers_attention:
357
+ flash_attention: true
358
+
359
+ warmup_steps: 10
360
+ evals_per_epoch: 4
361
+ eval_table_size:
362
+ eval_max_new_tokens: 128
363
+ saves_per_epoch: 4
364
+ save_total_limit: 2
365
+ debug:
366
+ deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json
367
+ weight_decay: 0.0
368
+ fsdp:
369
+ fsdp_config:
370
+ special_tokens:
371
+ eos_token: "<|im_end|>"
372
+
373
+
374
+
375
+ ```
376
+
377
+ </details><br>
378
+
379
+ # qwen-out
380
+
381
+ This model was trained from scratch on the None dataset.
382
+ It achieves the following results on the evaluation set:
383
+ - Loss: 0.3931
384
+
385
+ ## Model description
386
+
387
+ More information needed
388
+
389
+ ## Intended uses & limitations
390
+
391
+ More information needed
392
+
393
+ ## Training and evaluation data
394
+
395
+ More information needed
396
+
397
+ ## Training procedure
398
+
399
+ ### Training hyperparameters
400
+
401
+ The following hyperparameters were used during training:
402
+ - learning_rate: 1e-05
403
+ - train_batch_size: 1
404
+ - eval_batch_size: 1
405
+ - seed: 42
406
+ - distributed_type: multi-GPU
407
+ - num_devices: 8
408
+ - gradient_accumulation_steps: 8
409
+ - total_train_batch_size: 64
410
+ - total_eval_batch_size: 8
411
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
412
+ - lr_scheduler_type: cosine
413
+ - lr_scheduler_warmup_steps: 10
414
+ - num_epochs: 1
415
+
416
+ ### Training results
417
+
418
+ | Training Loss | Epoch | Step | Validation Loss |
419
+ |:-------------:|:-----:|:----:|:---------------:|
420
+ | 0.3528 | 0.0 | 1 | 0.3848 |
421
+ | 0.3687 | 0.25 | 291 | 0.3988 |
422
+ | 0.4156 | 0.5 | 582 | 0.3966 |
423
+ | 0.3826 | 0.75 | 873 | 0.3931 |
424
+
425
+
426
+ ### Framework versions
427
+
428
+ - Transformers 4.40.0.dev0
429
+ - Pytorch 2.2.2+cu121
430
+ - Datasets 2.15.0
431
+ - Tokenizers 0.15.0
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<|endoftext|>": 151643,
3
+ "<|im_end|>": 151645,
4
+ "<|im_start|>": 151644
5
+ }
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/workspace/axolotl/qwen-checkpoint",
3
+ "architectures": [
4
+ "Qwen2ForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 8192,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 49152,
12
+ "max_position_embeddings": 32768,
13
+ "max_window_layers": 28,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 64,
16
+ "num_hidden_layers": 80,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-06,
19
+ "rope_theta": 1000000.0,
20
+ "sliding_window": 32768,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.40.0.dev0",
24
+ "use_cache": false,
25
+ "use_sliding_window": false,
26
+ "vocab_size": 152064
27
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": 151643,
5
+ "max_new_tokens": 2048,
6
+ "transformers_version": "4.40.0.dev0"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f16188fe53cb82eb29a1aca960d138fa3ae0cb26adf6a4bdf41b94f81e0a86ce
3
+ size 4404040848
model-00002-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e7679502ae1e91469a0ad83e4dddfb8e02e2a862bb673e7f4d07d54042d02c0
3
+ size 4630620760
model-00003-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:286252966a7c8cc612c246be879323447440b3bc65a2d872d5db1773883c7de8
3
+ size 4630620760
model-00004-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d65adc6b18b80c5a825cf986dd4bd23efa71ea947154852014bfc7648ed1af0
3
+ size 4328576600
model-00005-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02bc79047666afcbec0996b57af831a11a8df76327ef7378853589258611d8f8
3
+ size 4630620760
model-00006-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2a6261766ff586f63a932aed9104834ae06a961ccb5a225e25f739f12f76a58
3
+ size 4630620760
model-00007-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f808d514d088ea3076f52b3c095b8e840563d5e961f370463625203801bc3e18
3
+ size 4328576616
model-00008-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e5913601faebab290f04911746a740f70b48363f858f63404b383d909f1dd82
3
+ size 4630620784
model-00009-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb24e01519bb2fec5f3fd7616ed6a355f04e62772864d3ce739fa3250999eded
3
+ size 4630620784
model-00010-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1acc0f71cd212b585e81ba6ef8e9d59eb5732ccdeafa971337b17fa9fa99a626
3
+ size 4328576616
model-00011-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc0bc6c24bc25f6a6f88ed31fb19e6abc41b192388867e7fad6c308ee26a1b1f
3
+ size 4630620784
model-00012-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:455f01815637d8d617328a7a383ef6669626c11399b7b7f0f4a9f5c8bbd456df
3
+ size 4630620784
model-00013-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eaf82e6d048e4ce4f5a9eb7c09dd53e9fce50d783d8c5fcedfff39b16f91be25
3
+ size 4328576616
model-00014-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31e4ff5a0b536e0c133ebd525ebb99e45f3e56b19edf1c6f860bf460af48e246
3
+ size 4630620784
model-00015-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1139dd9ee3284bb5affbdbbd40dbfef99f3b4a139e21ddbc96a979d87cd83b2f
3
+ size 4630620784
model-00016-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03235d7695bcde445b19d35367774e01f9043a6378d12840a201b39f0c70bff3
3
+ size 4328576616
model-00017-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a7e220c9e5c4ac78b279c69d1849da22766fcd16f737c9661ce1ea19905a3a6
3
+ size 4630620784
model-00018-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3412ae1b97c0068b0beac29bedc17a2753f15a86c258173e536d87c21b57e9a2
3
+ size 4630620784
model-00019-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c61d2d242c030e2fb7c59182072f2d3303d91bfe859b41857d3562e92fcd3f60
3
+ size 4328576616
model-00020-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dbd3d79b57970ea8c0c5d7e76d5b6582cca922079e629b3506fcf778cbb981c
3
+ size 4630620784
model-00021-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5877fe7e06f3f2264f06cdbdc086eec79ff65a659b50c09d18d64d395cf70c88
3
+ size 4630620784
model-00022-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b6d4737efe79c2e107e024398abc280d248beb4df0af92b6de4199c98d24462
3
+ size 4328576616
model-00023-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f85d905e51ecde2dd275c89f8984939bb8b8c4650d6eb2910b63ca8890935227
3
+ size 4630620784
model-00024-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf38ac8b73355c4a4ff114a7922e503ce5fb22c9233d0951be51d8a9ed23e32a
3
+ size 4630620784
model-00025-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7eeba931ff2f9c90aec800668489a579e25897ab36b878aa3b4455c10fcd910b
3
+ size 4328576616
model-00026-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c04e0bd49d5965b51779c1be60afa72aef94c3c1168c55cd5ef540476f43a513
3
+ size 4630620784
model-00027-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fde3e36ce34a9119a22898498325a670ba1f3c1c5081412424ab08d5003faaa1
3
+ size 4630620784
model-00028-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e618d6655ab3775b3cfe29f2445d7865b17f7a1369331756222b9152116f3e5
3
+ size 4328576616
model-00029-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0513d96d2644e2d792ba314d3a640cc1c9dc78a1cd7da96686bd1eebed99bff2
3
+ size 4630620784
model-00030-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f05b6286ecedbf56b8d7b514410892dc55667616c2ac2d639a6757fff576010
3
+ size 4630620784
model-00031-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c409b47737a0ce1dc9f6c64f3ef6528abc9be1b7bb12663421a115db636dc060
3
+ size 4328576616
model-00032-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13004d20bf600e9ced6b638474d8afa2d0b2bbdd8c0dd125777ba8221d7e0579
3
+ size 4630620784
model-00033-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:beeaf3683d3b4c0aacebc83f8462a2d0197bcb5bf99ef9e44898c87320417611
3
+ size 4630620784
model-00034-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:730a1601e24e6976439fa5f1f178f3108088ae52f232f4e448028a7c8f4b22b9
3
+ size 4328576616
model-00035-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:963d2a7caf1ab4d7b10e57c3b2ef7ca728b7c17d0a2dde1c199b031a3fc10d2b
3
+ size 4630620784
model-00036-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:749e47d36035f8b192f2957852f991021cda63cbaa03a09caa1381b443cd6a73
3
+ size 4630620784
model-00037-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db7111b218a8062729f72a336b1eef688b984fd22bc94233bc645cfbef4b93eb
3
+ size 4328576616
model-00038-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ea39b7d028d2a922c8395ef307fac2e9102e1b2741c8fbc6bc78fd8abc5467a
3
+ size 4630620784
model-00039-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71dd7c4e60bd514f4c8d394f144f112504c70070e2ee6bebc49075aedd840bc8
3
+ size 4630620784
model-00040-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d1eb9084a2b2e7dbb486e0c4cb8452dba7bdb2362ae9b03526236b724d7bc9f
3
+ size 4328576616
model-00041-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66020f1d9d29129a3e6f8e7acd1b231409a1cf9d029b5b41a5bf0701a9af267c
3
+ size 4630620784
model-00042-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be713fe1718e74147ec85183f4dfbd3047d7bdf1a366254e17d45a16564eaf17
3
+ size 4630620784
model-00043-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a74e4d2ddd706e027257fcf074c4049bd3922673f21e0d18947e987e78b0b98
3
+ size 4328576616
model-00044-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7455f8a6c80d0eacf02f264d6efe287d50bc3aab6bcc359b49db7d6d1a402941
3
+ size 4630620784
model-00045-of-00049.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5174baeb19471ae62e801077be55e0d0f6addcd35dd63e81fc30d43e9fbb94e
3
+ size 4630620784