macadeliccc commited on
Commit
541f569
1 Parent(s): babaa85

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +745 -0
README.md ADDED
@@ -0,0 +1,745 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: llama3.2
4
+ base_model: macadeliccc/magistrate-3.2-3b-base
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: outputs/magistrate-3.2-3b
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.4.1`
19
+ ```yaml
20
+ base_model: macadeliccc/magistrate-3.2-3b-base
21
+ model_type: LlamaForCausalLM
22
+ tokenizer_type: AutoTokenizer
23
+
24
+ load_in_8bit: false
25
+ load_in_4bit: false
26
+ strict: false
27
+
28
+ datasets:
29
+ - path: json
30
+ type: sharegpt
31
+ conversation: chatml
32
+ data_files: train/hermes-2.5.jsonl
33
+ # - path: json
34
+ # type: sharegpt
35
+ # conversation: chatml
36
+ # data_files: train/financial_instructions_cleaned_2.json
37
+ - path: json
38
+ type: sharegpt
39
+ conversation: chatml
40
+ data_files: train/glaive-function-calling-5k.json
41
+ - path: json
42
+ type: sharegpt
43
+ conversation: chatml
44
+ data_files: train/func-calling-singleturn.json
45
+ - path: json
46
+ type: sharegpt
47
+ conversation: chatml
48
+ data_files: train/func-calling.json
49
+ - path: json
50
+ type: sharegpt
51
+ conversation: chatml
52
+ data_files: train/json-mode-agentic.json
53
+ - path: json
54
+ type: sharegpt
55
+ conversation: chatml
56
+ data_files: train/json-mode-singleturn.json
57
+ - path: json
58
+ type: sharegpt
59
+ conversation: chatml
60
+ data_files: train/reasoning_sharegpt.json
61
+ - path: json
62
+ type: sharegpt
63
+ conversation: chatml
64
+ data_files: train/systemchat_2_0_small.json
65
+ - path: json
66
+ type: sharegpt
67
+ conversation: chatml
68
+ data_files: train/argument_dataset/303_creative_llc_v__elenis_sharegpt.json
69
+ - path: json
70
+ type: sharegpt
71
+ conversation: chatml
72
+ data_files: train/argument_dataset/abitron_austria_gmbh_v__hetronic_international__inc__sharegpt.json
73
+ - path: json
74
+ type: sharegpt
75
+ conversation: chatml
76
+ data_files: train/argument_dataset/acheson_hotels__llc_v__laufer_sharegpt.json
77
+ - path: json
78
+ type: sharegpt
79
+ conversation: chatml
80
+ data_files: train/argument_dataset/alexander_v__sc_conference_of_naacp_sharegpt.json
81
+ - path: json
82
+ type: sharegpt
83
+ conversation: chatml
84
+ data_files: train/argument_dataset/amgen_inc__v__sanofi_sharegpt.json
85
+ - path: json
86
+ type: sharegpt
87
+ conversation: chatml
88
+ data_files: train/argument_dataset/andy_warhol_found___inc__v__goldsmith_sharegpt.json
89
+ - path: json
90
+ type: sharegpt
91
+ conversation: chatml
92
+ data_files: train/argument_dataset/arizona_v__navajo_nation_sharegpt.json
93
+ - path: json
94
+ type: sharegpt
95
+ conversation: chatml
96
+ data_files: train/argument_dataset/becerra__sec__of_h_hs_v__san_carlos_apache_tribe_sharegpt.json
97
+ - path: json
98
+ type: sharegpt
99
+ conversation: chatml
100
+ data_files: train/argument_dataset/biden_v__nebraska_sharegpt.json
101
+ - path: json
102
+ type: sharegpt
103
+ conversation: chatml
104
+ data_files: train/argument_dataset/bissonnette_v__lepage_bakeries_park_st___llc_sharegpt.json
105
+ - path: json
106
+ type: sharegpt
107
+ conversation: chatml
108
+ data_files: train/argument_dataset/bittner_v__united_states_sharegpt.json
109
+ - path: json
110
+ type: sharegpt
111
+ conversation: chatml
112
+ data_files: train/argument_dataset/brown_v__united_states_sharegpt.json
113
+ - path: json
114
+ type: sharegpt
115
+ conversation: chatml
116
+ data_files: train/argument_dataset/cantero_v__bank_of_america__n_a__sharegpt.json
117
+ - path: json
118
+ type: sharegpt
119
+ conversation: chatml
120
+ data_files: train/argument_dataset/cfpb_v__com__fin__services_assn__sharegpt.json
121
+ - path: json
122
+ type: sharegpt
123
+ conversation: chatml
124
+ data_files: train/argument_dataset/chiaverini_v__city_of_napoleon_sharegpt.json
125
+ - path: json
126
+ type: sharegpt
127
+ conversation: chatml
128
+ data_files: train/argument_dataset/ciminelli_v__united_state_sharegpt.json
129
+ - path: json
130
+ type: sharegpt
131
+ conversation: chatml
132
+ data_files: train/argument_dataset/city_of_grants_pass_v__johnson_sharegpt.json
133
+ - path: json
134
+ type: sharegpt
135
+ conversation: chatml
136
+ data_files: train/argument_dataset/coinbase__inc__v__bielski_sharegpt.json
137
+ - path: json
138
+ type: sharegpt
139
+ conversation: chatml
140
+ data_files: train/argument_dataset/coinbase__inc__v__suski_sharegpt.json
141
+ - path: json
142
+ type: sharegpt
143
+ conversation: chatml
144
+ data_files: train/argument_dataset/connelly_v__united_states_sharegpt.json
145
+ - path: json
146
+ type: sharegpt
147
+ conversation: chatml
148
+ data_files: train/argument_dataset/corner_post__inc__v__bd__of_governors__frs_sharegpt.json
149
+ - path: json
150
+ type: sharegpt
151
+ conversation: chatml
152
+ data_files: train/argument_dataset/counterman_v__colorado_sharegpt.json
153
+ - path: json
154
+ type: sharegpt
155
+ conversation: chatml
156
+ data_files: train/argument_dataset/cruz_v__arizona_sharegpt.json
157
+ - path: json
158
+ type: sharegpt
159
+ conversation: chatml
160
+ data_files: train/argument_dataset/culley_v__marshall_sharegpt.json
161
+ - path: json
162
+ type: sharegpt
163
+ conversation: chatml
164
+ data_files: train/argument_dataset/dept__of_agric__rural_dev__v__kirtz_sharegpt.json
165
+ - path: json
166
+ type: sharegpt
167
+ conversation: chatml
168
+ data_files: train/argument_dataset/dept__of_education_v__brown_sharegpt.json
169
+ - path: json
170
+ type: sharegpt
171
+ conversation: chatml
172
+ data_files: train/argument_dataset/dept__of_state_v__munoz_sharegpt.json
173
+ - path: json
174
+ type: sharegpt
175
+ conversation: chatml
176
+ data_files: train/argument_dataset/devillier_v__texas_sharegpt.json
177
+ - path: json
178
+ type: sharegpt
179
+ conversation: chatml
180
+ data_files: train/argument_dataset/diaz_v__united_states_sharegpt.json
181
+ - path: json
182
+ type: sharegpt
183
+ conversation: chatml
184
+ data_files: train/argument_dataset/dubin_v__united_states_sharegpt.json
185
+ - path: json
186
+ type: sharegpt
187
+ conversation: chatml
188
+ data_files: train/argument_dataset/dupree_v__younger_sharegpt.json
189
+ - path: json
190
+ type: sharegpt
191
+ conversation: chatml
192
+ data_files: train/argument_dataset/erlinger_v__united_states_sharegpt.json
193
+ - path: json
194
+ type: sharegpt
195
+ conversation: chatml
196
+ data_files: train/argument_dataset/fbi_v__fikre_sharegpt.json
197
+ - path: json
198
+ type: sharegpt
199
+ conversation: chatml
200
+ data_files: train/argument_dataset/fda_v__alliance_hippocratic_medicine_sharegpt.json
201
+ - path: json
202
+ type: sharegpt
203
+ conversation: chatml
204
+ data_files: train/argument_dataset/financial_oversight_board_v__cpi_sharegpt.json
205
+ - path: json
206
+ type: sharegpt
207
+ conversation: chatml
208
+ data_files: train/argument_dataset/fischer_v__united_states_sharegpt.json
209
+ - path: json
210
+ type: sharegpt
211
+ conversation: chatml
212
+ data_files: train/argument_dataset/garland__att_y_gen__v__cargill_sharegpt.json
213
+ - path: json
214
+ type: sharegpt
215
+ conversation: chatml
216
+ data_files: train/argument_dataset/glacier_northwest__inc__v__int_l_brotherhood_of_teamsters_sharegpt.json
217
+ - path: json
218
+ type: sharegpt
219
+ conversation: chatml
220
+ data_files: train/argument_dataset/gonzalez_v__google_llc_sharegpt.json
221
+ - path: json
222
+ type: sharegpt
223
+ conversation: chatml
224
+ data_files: train/argument_dataset/gonzalez_v__trevino_sharegpt.json
225
+ - path: json
226
+ type: sharegpt
227
+ conversation: chatml
228
+ data_files: train/argument_dataset/great_lakes_insurance_se_v__raiders_retreat_realty_co___llc_sharegpt.json
229
+ - path: json
230
+ type: sharegpt
231
+ conversation: chatml
232
+ data_files: train/argument_dataset/groff_v__dejoy_sharegpt.json
233
+ - path: json
234
+ type: sharegpt
235
+ conversation: chatml
236
+ data_files: train/argument_dataset/harrington_v__purdue_pharma_l_p__sharegpt.json
237
+ - path: json
238
+ type: sharegpt
239
+ conversation: chatml
240
+ data_files: train/argument_dataset/harrow_v__dept__of_defense_sharegpt.json
241
+ - path: json
242
+ type: sharegpt
243
+ conversation: chatml
244
+ data_files: train/argument_dataset/health_and_hospital_corp__v__talevski_sharegpt.json
245
+ - path: json
246
+ type: sharegpt
247
+ conversation: chatml
248
+ data_files: train/argument_dataset/helix_energy_solutions_v__hewitt_sharegpt.json
249
+ - path: json
250
+ type: sharegpt
251
+ conversation: chatml
252
+ data_files: train/argument_dataset/in_re_grand_jury_sharegpt.json
253
+ - path: json
254
+ type: sharegpt
255
+ conversation: chatml
256
+ data_files: train/argument_dataset/jack_daniel_s_properties__inc__v__vip_products_sharegpt.json
257
+ - path: json
258
+ type: sharegpt
259
+ conversation: chatml
260
+ data_files: train/argument_dataset/jones_v__hendrix_sharegpt.json
261
+ - path: json
262
+ type: sharegpt
263
+ conversation: chatml
264
+ data_files: train/argument_dataset/karcho_polselli_v__irs_sharegpt.json
265
+ - path: json
266
+ type: sharegpt
267
+ conversation: chatml
268
+ data_files: train/argument_dataset/lac_du_flambeau_band_v__coughlin_sharegpt.json
269
+ - path: json
270
+ type: sharegpt
271
+ conversation: chatml
272
+ data_files: train/argument_dataset/lindke_v__freed_sharegpt.json
273
+ - path: json
274
+ type: sharegpt
275
+ conversation: chatml
276
+ data_files: train/argument_dataset/loper_bright_enterprises__inc__v__raimondo__sec__of_comm__sharegpt.json
277
+ - path: json
278
+ type: sharegpt
279
+ conversation: chatml
280
+ data_files: train/argument_dataset/lora_v__united_states_sharegpt.json
281
+ - path: json
282
+ type: sharegpt
283
+ conversation: chatml
284
+ data_files: train/argument_dataset/macquarie_infrastructure_corp__v__moab_partners__l_p__sharegpt.json
285
+ - path: json
286
+ type: sharegpt
287
+ conversation: chatml
288
+ data_files: train/argument_dataset/mallory_v__norfolk_southern_railway_co__sharegpt.json
289
+ - path: json
290
+ type: sharegpt
291
+ conversation: chatml
292
+ data_files: train/argument_dataset/mcintosh_v__united_states_sharegpt.json
293
+ - path: json
294
+ type: sharegpt
295
+ conversation: chatml
296
+ data_files: train/argument_dataset/merrill_v__milligan_sharegpt.json
297
+ - path: json
298
+ type: sharegpt
299
+ conversation: chatml
300
+ data_files: train/argument_dataset/moore_v__harper_sharegpt.json
301
+ - path: json
302
+ type: sharegpt
303
+ conversation: chatml
304
+ data_files: train/argument_dataset/moore_v__united_states_sharegpt.json
305
+ - path: json
306
+ type: sharegpt
307
+ conversation: chatml
308
+ data_files: train/argument_dataset/moyle_v__united_states_sharegpt.json
309
+ - path: json
310
+ type: sharegpt
311
+ conversation: chatml
312
+ data_files: train/argument_dataset/muldrow_v__st__louis_sharegpt.json
313
+ - path: json
314
+ type: sharegpt
315
+ conversation: chatml
316
+ data_files: train/argument_dataset/murray_v__ubs_securities__llc_sharegpt.json
317
+ - path: json
318
+ type: sharegpt
319
+ conversation: chatml
320
+ data_files: train/argument_dataset/murthy__surgeon_gen__v__missouri_sharegpt.json
321
+ - path: json
322
+ type: sharegpt
323
+ conversation: chatml
324
+ data_files: train/argument_dataset/netchoice__llc_v__paxton_sharegpt.json
325
+ - path: json
326
+ type: sharegpt
327
+ conversation: chatml
328
+ data_files: train/argument_dataset/new_york_v__new_jersey_sharegpt.json
329
+ - path: json
330
+ type: sharegpt
331
+ conversation: chatml
332
+ data_files: train/argument_dataset/nra_v__vullo_sharegpt.json
333
+ - path: json
334
+ type: sharegpt
335
+ conversation: chatml
336
+ data_files: train/argument_dataset/o_connor_ratcliff_v__garnier_sharegpt.json
337
+ - path: json
338
+ type: sharegpt
339
+ conversation: chatml
340
+ data_files: train/argument_dataset/oh_adjutant_gen__s_dept__v__flra_sharegpt.json
341
+ - path: json
342
+ type: sharegpt
343
+ conversation: chatml
344
+ data_files: train/argument_dataset/ohio_v__epa_sharegpt.json
345
+ - path: json
346
+ type: sharegpt
347
+ conversation: chatml
348
+ data_files: train/argument_dataset/perez_v__sturgis_public_schools_sharegpt.json
349
+ - path: json
350
+ type: sharegpt
351
+ conversation: chatml
352
+ data_files: train/argument_dataset/pugin_v__garland_sharegpt.json
353
+ - path: json
354
+ type: sharegpt
355
+ conversation: chatml
356
+ data_files: train/argument_dataset/pulsifer_v__united_states_sharegpt.json
357
+ - path: json
358
+ type: sharegpt
359
+ conversation: chatml
360
+ data_files: train/argument_dataset/relentless__inc__v__dept__of_commerce_sharegpt.json
361
+ - path: json
362
+ type: sharegpt
363
+ conversation: chatml
364
+ data_files: train/argument_dataset/rudisill_v__mcdonough__sec__of_va_sharegpt.json
365
+ - path: json
366
+ type: sharegpt
367
+ conversation: chatml
368
+ data_files: train/argument_dataset/sackett_v__epa_sharegpt.json
369
+ - path: json
370
+ type: sharegpt
371
+ conversation: chatml
372
+ data_files: train/argument_dataset/samia_v__united_states_sharegpt.json
373
+ - path: json
374
+ type: sharegpt
375
+ conversation: chatml
376
+ data_files: train/argument_dataset/santos_zacaria_v__garland__att_y_gen__sharegpt.json
377
+ - path: json
378
+ type: sharegpt
379
+ conversation: chatml
380
+ data_files: train/argument_dataset/sec_v__cochran_sharegpt.json
381
+ - path: json
382
+ type: sharegpt
383
+ conversation: chatml
384
+ data_files: train/argument_dataset/sec_v__jarkesy_sharegpt.json
385
+ - path: json
386
+ type: sharegpt
387
+ conversation: chatml
388
+ data_files: train/argument_dataset/sheetz_v__county_of_el_dorado_sharegpt.json
389
+ - path: json
390
+ type: sharegpt
391
+ conversation: chatml
392
+ data_files: train/argument_dataset/slack_technologies__llc_v__pirani_sharegpt.json
393
+ - path: json
394
+ type: sharegpt
395
+ conversation: chatml
396
+ data_files: train/argument_dataset/smith_v__arizona_sharegpt.json
397
+ - path: json
398
+ type: sharegpt
399
+ conversation: chatml
400
+ data_files: train/argument_dataset/smith_v__spizzirri_sharegpt.json
401
+ - path: json
402
+ type: sharegpt
403
+ conversation: chatml
404
+ data_files: train/argument_dataset/smith_v__united_states_sharegpt.json
405
+ - path: json
406
+ type: sharegpt
407
+ conversation: chatml
408
+ data_files: train/argument_dataset/snyder_v__united_states_sharegpt.json
409
+ - path: json
410
+ type: sharegpt
411
+ conversation: chatml
412
+ data_files: train/argument_dataset/starbucks_corp__v__mckinney_sharegpt.json
413
+ - path: json
414
+ type: sharegpt
415
+ conversation: chatml
416
+ data_files: train/argument_dataset/students_for_fair_admissions_v__university_of_nc_sharegpt.json
417
+ - path: json
418
+ type: sharegpt
419
+ conversation: chatml
420
+ data_files: train/argument_dataset/texas_v__new_mexico_and_colorado_sharegpt.json
421
+ - path: json
422
+ type: sharegpt
423
+ conversation: chatml
424
+ data_files: train/argument_dataset/thornell_v__jones_sharegpt.json
425
+ - path: json
426
+ type: sharegpt
427
+ conversation: chatml
428
+ data_files: train/argument_dataset/truck_insurance_exchange_v__kaiser_gypsum_co__inc__sharegpt.json
429
+ - path: json
430
+ type: sharegpt
431
+ conversation: chatml
432
+ data_files: train/argument_dataset/trump_v__anderson_sharegpt.json
433
+ - path: json
434
+ type: sharegpt
435
+ conversation: chatml
436
+ data_files: train/argument_dataset/turkiye_halk_bankasi_a_s__v__united_states_sharegpt.json
437
+ - path: json
438
+ type: sharegpt
439
+ conversation: chatml
440
+ data_files: train/argument_dataset/twitter__inc__v__taamneh_sharegpt.json
441
+ - path: json
442
+ type: sharegpt
443
+ conversation: chatml
444
+ data_files: train/argument_dataset/tyler_v__hennepin_county_sharegpt.json
445
+ - path: json
446
+ type: sharegpt
447
+ conversation: chatml
448
+ data_files: train/argument_dataset/u_s___ex_rel__polansky_v__executive_health_sharegpt.json
449
+ - path: json
450
+ type: sharegpt
451
+ conversation: chatml
452
+ data_files: train/argument_dataset/u_s___ex_rel__schutte_v__supervalu_inc__sharegpt.json
453
+ - path: json
454
+ type: sharegpt
455
+ conversation: chatml
456
+ data_files: train/argument_dataset/united_states_trustee_v__john_q__hammons_fall_2006__llc_sharegpt.json
457
+ - path: json
458
+ type: sharegpt
459
+ conversation: chatml
460
+ data_files: train/argument_dataset/united_states_v__hansen_sharegpt.json
461
+ - path: json
462
+ type: sharegpt
463
+ conversation: chatml
464
+ data_files: train/argument_dataset/united_states_v__rahimi_sharegpt.json
465
+ - path: json
466
+ type: sharegpt
467
+ conversation: chatml
468
+ data_files: train/argument_dataset/united_states_v__texas_sharegpt.json
469
+ - path: json
470
+ type: sharegpt
471
+ conversation: chatml
472
+ data_files: train/argument_dataset/vidal__under_sec__of_comm__v__elster_sharegpt.json
473
+ - path: json
474
+ type: sharegpt
475
+ conversation: chatml
476
+ data_files: train/argument_dataset/warner_chappell_music__inc__v__nealy_sharegpt.json
477
+ - path: json
478
+ type: sharegpt
479
+ conversation: chatml
480
+ data_files: train/argument_dataset/wilkins_v__united_states_sharegpt.json
481
+ - path: json
482
+ type: sharegpt
483
+ conversation: chatml
484
+ data_files: train/argument_dataset/wilkinson_v__garland__att_y_gen__sharegpt.json
485
+ - path: json
486
+ type: sharegpt
487
+ conversation: chatml
488
+ data_files: train/argument_dataset/yegiazaryan_v__smagin_sharegpt.json
489
+
490
+ chat_template: chatml
491
+
492
+ unfrozen_parameters:
493
+ - ^lm_head.weight$
494
+ - ^model.embed_tokens.weight$
495
+ # input_layernorm layers
496
+ - model.layers.0.input_layernorm
497
+ - model.layers.1.input_layernorm
498
+ - model.layers.2.input_layernorm
499
+ - model.layers.3.input_layernorm
500
+ - model.layers.4.input_layernorm
501
+ - model.layers.5.input_layernorm
502
+ - model.layers.6.input_layernorm
503
+ - model.layers.7.input_layernorm
504
+ - model.layers.8.input_layernorm
505
+ - model.layers.9.input_layernorm
506
+ - model.layers.10.input_layernorm
507
+ - model.layers.11.input_layernorm
508
+ - model.layers.12.input_layernorm
509
+ - model.layers.13.input_layernorm
510
+ # mlp.down_proj layers
511
+ - model.layers.0.mlp.down_proj
512
+ - model.layers.1.mlp.down_proj
513
+ - model.layers.17.mlp.down_proj
514
+ - model.layers.19.mlp.down_proj
515
+ - model.layers.18.mlp.down_proj
516
+ - model.layers.5.mlp.down_proj
517
+ - model.layers.20.mlp.down_proj
518
+ - model.layers.2.mlp.down_proj
519
+ - model.layers.4.mlp.down_proj
520
+ - model.layers.6.mlp.down_proj
521
+ - model.layers.3.mlp.down_proj
522
+ - model.layers.16.mlp.down_proj
523
+ - model.layers.15.mlp.down_proj
524
+ - model.layers.13.mlp.down_proj
525
+ # mlp.gate_proj layers
526
+ - model.layers.0.mlp.gate_proj
527
+ - model.layers.1.mlp.gate_proj
528
+ - model.layers.2.mlp.gate_proj
529
+ - model.layers.3.mlp.gate_proj
530
+ - model.layers.22.mlp.gate_proj
531
+ - model.layers.21.mlp.gate_proj
532
+ - model.layers.20.mlp.gate_proj
533
+ - model.layers.23.mlp.gate_proj
534
+ - model.layers.19.mlp.gate_proj
535
+ - model.layers.4.mlp.gate_proj
536
+ - model.layers.18.mlp.gate_proj
537
+ - model.layers.17.mlp.gate_proj
538
+ - model.layers.5.mlp.gate_proj
539
+ - model.layers.24.mlp.gate_proj
540
+ # mlp.up_proj layers
541
+ - model.layers.4.mlp.up_proj
542
+ - model.layers.3.mlp.up_proj
543
+ - model.layers.5.mlp.up_proj
544
+ - model.layers.6.mlp.up_proj
545
+ - model.layers.7.mlp.up_proj
546
+ - model.layers.2.mlp.up_proj
547
+ - model.layers.8.mlp.up_proj
548
+ - model.layers.14.mlp.up_proj
549
+ - model.layers.13.mlp.up_proj
550
+ - model.layers.11.mlp.up_proj
551
+ - model.layers.9.mlp.up_proj
552
+ - model.layers.1.mlp.up_proj
553
+ - model.layers.15.mlp.up_proj
554
+ - model.layers.12.mlp.up_proj
555
+ # post_attention_layernorm layers
556
+ - model.layers.0.post_attention_layernorm
557
+ - model.layers.1.post_attention_layernorm
558
+ - model.layers.2.post_attention_layernorm
559
+ - model.layers.3.post_attention_layernorm
560
+ - model.layers.4.post_attention_layernorm
561
+ - model.layers.5.post_attention_layernorm
562
+ - model.layers.6.post_attention_layernorm
563
+ - model.layers.7.post_attention_layernorm
564
+ - model.layers.8.post_attention_layernorm
565
+ - model.layers.9.post_attention_layernorm
566
+ - model.layers.10.post_attention_layernorm
567
+ - model.layers.11.post_attention_layernorm
568
+ - model.layers.12.post_attention_layernorm
569
+ - model.layers.13.post_attention_layernorm
570
+ # self_attn.k_proj layers
571
+ - model.layers.25.self_attn.k_proj
572
+ - model.layers.22.self_attn.k_proj
573
+ - model.layers.19.self_attn.k_proj
574
+ - model.layers.20.self_attn.k_proj
575
+ - model.layers.17.self_attn.k_proj
576
+ - model.layers.24.self_attn.k_proj
577
+ - model.layers.23.self_attn.k_proj
578
+ - model.layers.18.self_attn.k_proj
579
+ - model.layers.21.self_attn.k_proj
580
+ - model.layers.27.self_attn.k_proj
581
+ - model.layers.15.self_attn.k_proj
582
+ - model.layers.10.self_attn.k_proj
583
+ - model.layers.6.self_attn.k_proj
584
+ - model.layers.5.self_attn.k_proj
585
+ # self_attn.o_proj layers
586
+ - model.layers.13.self_attn.o_proj
587
+ - model.layers.7.self_attn.o_proj
588
+ - model.layers.12.self_attn.o_proj
589
+ - model.layers.10.self_attn.o_proj
590
+ - model.layers.5.self_attn.o_proj
591
+ - model.layers.21.self_attn.o_proj
592
+ - model.layers.6.self_attn.o_proj
593
+ - model.layers.19.self_attn.o_proj
594
+ - model.layers.8.self_attn.o_proj
595
+ - model.layers.20.self_attn.o_proj
596
+ - model.layers.22.self_attn.o_proj
597
+ - model.layers.9.self_attn.o_proj
598
+ - model.layers.17.self_attn.o_proj
599
+ - model.layers.11.self_attn.o_proj
600
+ # self_attn.q_proj layers
601
+ - model.layers.12.self_attn.q_proj
602
+ - model.layers.13.self_attn.q_proj
603
+ - model.layers.9.self_attn.q_proj
604
+ - model.layers.8.self_attn.q_proj
605
+ - model.layers.10.self_attn.q_proj
606
+ - model.layers.14.self_attn.q_proj
607
+ - model.layers.11.self_attn.q_proj
608
+ - model.layers.15.self_attn.q_proj
609
+ - model.layers.26.self_attn.q_proj
610
+ - model.layers.6.self_attn.q_proj
611
+ - model.layers.7.self_attn.q_proj
612
+ - model.layers.16.self_attn.q_proj
613
+ - model.layers.5.self_attn.q_proj
614
+ - model.layers.25.self_attn.q_proj
615
+ # model.norm layers
616
+ # self_attn.v_proj layers
617
+ - model.layers.23.self_attn.v_proj
618
+ - model.layers.14.self_attn.v_proj
619
+ - model.layers.15.self_attn.v_proj
620
+ - model.layers.19.self_attn.v_proj
621
+ - model.layers.3.self_attn.v_proj
622
+ - model.layers.18.self_attn.v_proj
623
+ - model.layers.25.self_attn.v_proj
624
+ - model.layers.4.self_attn.v_proj
625
+ - model.layers.17.self_attn.v_proj
626
+ - model.layers.22.self_attn.v_proj
627
+ - model.layers.20.self_attn.v_proj
628
+ - model.layers.13.self_attn.v_proj
629
+ - model.layers.6.self_attn.v_proj
630
+ - model.layers.27.self_attn.v_proj
631
+
632
+ val_set_size: 0.05
633
+ output_dir: ./outputs/magistrate-3.2-3b
634
+
635
+ sequence_len: 8192
636
+ sample_packing: true
637
+ eval_sample_packing: false
638
+ pad_to_sequence_len: true
639
+
640
+ adapter:
641
+
642
+ wandb_project:
643
+ wandb_entity:
644
+ wandb_watch:
645
+ wandb_name:
646
+ wandb_log_model:
647
+
648
+ gradient_accumulation_steps: 8
649
+ micro_batch_size: 1
650
+ num_epochs: 3
651
+ optimizer: paged_adamw_32bit
652
+ lr_scheduler: cosine
653
+ learning_rate: 2e-4
654
+
655
+ train_on_inputs: false
656
+ group_by_length: false
657
+ bf16: auto
658
+ fp16:
659
+ tf32: false
660
+
661
+ gradient_checkpointing: true
662
+ early_stopping_patience:
663
+ resume_from_checkpoint:
664
+ local_rank:
665
+ logging_steps: 1
666
+ xformers_attention:
667
+ flash_attention: true
668
+ s2_attention:
669
+
670
+ warmup_steps: 1000
671
+ evals_per_epoch: 2
672
+ eval_table_size:
673
+ eval_max_new_tokens: 128
674
+ saves_per_epoch: 1
675
+ debug:
676
+ deepspeed: deepspeed_configs/zero3.json
677
+ weight_decay: 0.0
678
+ fsdp:
679
+ fsdp_config:
680
+ special_tokens:
681
+ eos_token: "<|im_end|>"
682
+ pad_token: "<|end_of_text|>"
683
+ tokens:
684
+ - "<|im_start|>"
685
+ - "<|im_end|>"
686
+ ```
687
+
688
+ </details><br>
689
+
690
+ # outputs/magistrate-3.2-3b
691
+
692
+ This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
693
+ It achieves the following results on the evaluation set:
694
+ - Loss: 0.8067
695
+
696
+ ## Model description
697
+
698
+ More information needed
699
+
700
+ ## Intended uses & limitations
701
+
702
+ More information needed
703
+
704
+ ## Training and evaluation data
705
+
706
+ More information needed
707
+
708
+ ## Training procedure
709
+
710
+ ### Training hyperparameters
711
+
712
+ The following hyperparameters were used during training:
713
+ - learning_rate: 0.0002
714
+ - train_batch_size: 1
715
+ - eval_batch_size: 1
716
+ - seed: 42
717
+ - distributed_type: multi-GPU
718
+ - num_devices: 2
719
+ - gradient_accumulation_steps: 8
720
+ - total_train_batch_size: 16
721
+ - total_eval_batch_size: 2
722
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
723
+ - lr_scheduler_type: cosine
724
+ - lr_scheduler_warmup_steps: 1000
725
+ - num_epochs: 3
726
+
727
+ ### Training results
728
+
729
+ | Training Loss | Epoch | Step | Validation Loss |
730
+ |:-------------:|:------:|:----:|:---------------:|
731
+ | 1.3754 | 0.0005 | 1 | 1.7429 |
732
+ | 1.0 | 0.5002 | 1017 | 0.8864 |
733
+ | 0.9482 | 1.0005 | 2034 | 0.8395 |
734
+ | 0.6817 | 1.4987 | 3051 | 0.8063 |
735
+ | 0.697 | 1.9991 | 4068 | 0.7580 |
736
+ | 0.3769 | 2.4966 | 5085 | 0.8140 |
737
+ | 0.4278 | 2.9965 | 6102 | 0.8067 |
738
+
739
+
740
+ ### Framework versions
741
+
742
+ - Transformers 4.45.0
743
+ - Pytorch 2.3.1+cu121
744
+ - Datasets 2.21.0
745
+ - Tokenizers 0.20.0