BigHuggyD commited on
Commit
a453544
1 Parent(s): 1602023

Upload 2 files

Browse files
Files changed (2) hide show
  1. LICENSE +0 -0
  2. README.md +512 -0
LICENSE ADDED
File without changes
README.md ADDED
@@ -0,0 +1,512 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - generated_from_trainer
6
+ base_model: Qwen/Qwen2.5-72B
7
+ datasets:
8
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
9
+ - Nopm/Opus_WritingStruct
10
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
11
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
12
+ - Gryphe/ChatGPT-4o-Writing-Prompts
13
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
14
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
15
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
16
+ - allura-org/Celeste-1.x-data-mixture
17
+ - cognitivecomputations/dolphin-2.9.3
18
+ license_name: qwen
19
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
20
+ model-index:
21
+ - name: EVA-Qwen2.5-72B-SFFT-v0.2
22
+ results: []
23
+ ---
24
+
25
+
26
+
27
+ # EVA Qwen2.5-72B v0.2
28
+
29
+ <p>
30
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data.<br>
31
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
32
+ </p>
33
+
34
+ <p>Dedicated to Nev.</p>
35
+
36
+ <p><b>NOTE: LLM-Compressor quants don't seem to work correctly, quality seems to be much worse than normal. It wasn't the case with previous versions. GGUF and GPTQ seem to be unaffected.</b></p>
37
+ </br>
38
+ <p><b>Version notes for 0.2</b>: Optimized training hyperparameters and increased sequence length. Better instruction following deeper into context and less repetition.</p>
39
+
40
+ <p>
41
+ <p>Prompt format is ChatML.</p><br>
42
+ <h3>Recommended sampler values:</h3>
43
+ <ul>
44
+ <li>Temperature: 0.8</li>
45
+ <li>Min-P: 0.05</li>
46
+ <li>Top-A: 0.3</li>
47
+ <li>Repetition Penalty: 1.03</li>
48
+ </ul>
49
+
50
+ <h3>Recommended SillyTavern preset (via CalamitousFelicitousness):</h3>
51
+ <ul><li><a href="https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2/blob/main/EV01.json">Master import</a></li></ul>
52
+
53
+ </p>
54
+
55
+ <p>
56
+ <br>
57
+ <h3>
58
+ Training data:
59
+ </h3>
60
+ <ul>
61
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
62
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
63
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
64
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
65
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
66
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
67
+ </ul>
68
+ <h3>
69
+ Training time and hardware:
70
+ </h3>
71
+ <ul><li>17 hours on 8xH100 SXM</a></li></ul><br>
72
+ </p>
73
+ <p>Model was created by Kearm, Auri and Cahvay.</p>
74
+ <h4>Special thanks:</h4><ul>
75
+ <li>to Featherless for sponsoring this run</li>
76
+ <li>to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.</li>
77
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data</li>
78
+ <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
79
+
80
+ <h3>Statement about change in licensing for the future models.</h3>
81
+ <p>For all future EVA-Unit-01 models, there will be a provision in the license stating that Infermatic and any of its employees or paid associates cannot utilize, distribute, download, or otherwise make use of EVA models.
82
+ While this cannot retroactively apply to our licensing, we officially request Infermatic immediately cease use of our models for unwarranted profit, although we acknowledge at this point it will not likely be followed.
83
+ EVA models will still be available in the future on Featherless, ArliAI (in the future), and other providers who want to host them, as well as for local and cloud usage.</p>
84
+
85
+
86
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
87
+ <details><summary>See axolotl config</summary>
88
+
89
+ axolotl version: `0.4.1`
90
+ ```yaml
91
+ base_model: Qwen/Qwen2.5-72B
92
+
93
+ load_in_8bit: false
94
+ load_in_4bit: false
95
+ strict: false
96
+
97
+ plugins:
98
+ - axolotl.integrations.liger.LigerPlugin
99
+ liger_rope: true
100
+ liger_rms_norm: true
101
+ liger_swiglu: true
102
+ liger_fused_linear_cross_entropy: true
103
+
104
+ # plugins:
105
+ # - axolotl.integrations.spectrum.SpectrumPlugin
106
+
107
+ # spectrum_top_fraction: 0.5
108
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
109
+ # spectrum_model_name: Qwen/Qwen2.5-32B
110
+
111
+ datasets:
112
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
113
+ type: sharegpt
114
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
115
+ type: sharegpt
116
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
117
+ type: sharegpt
118
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
119
+ type: sharegpt
120
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
121
+ type: sharegpt
122
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
123
+ type: sharegpt
124
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
125
+ type: sharegpt
126
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
127
+ type: sharegpt
128
+
129
+ chat_template: chatml
130
+ shuffle_merged_datasets: true
131
+ val_set_size: 0.001
132
+ output_dir: EVA-Qwen2.5-72B-SFFT-v0.2
133
+
134
+ sequence_len: 10240
135
+ sample_packing: true
136
+ eval_sample_packing: false
137
+ pad_to_sequence_len: false
138
+
139
+ # adapter: qlora
140
+ # lora_model_dir:
141
+ # lora_r: 64
142
+ # lora_alpha: 128
143
+ # lora_dropout: 0.05
144
+ # lora_target_linear: true
145
+ # peft_use_dora: true
146
+
147
+ unfrozen_parameters:
148
+ - ^lm_head.weight$
149
+ - ^model.embed_tokens.weight$
150
+ # mlp.down_proj layers
151
+ - model.layers.62.mlp.down_proj
152
+ - model.layers.64.mlp.down_proj
153
+ - model.layers.63.mlp.down_proj
154
+ - model.layers.66.mlp.down_proj
155
+ - model.layers.65.mlp.down_proj
156
+ - model.layers.67.mlp.down_proj
157
+ - model.layers.68.mlp.down_proj
158
+ - model.layers.31.mlp.down_proj
159
+ - model.layers.60.mlp.down_proj
160
+ - model.layers.69.mlp.down_proj
161
+ - model.layers.61.mlp.down_proj
162
+ - model.layers.59.mlp.down_proj
163
+ - model.layers.30.mlp.down_proj
164
+ - model.layers.70.mlp.down_proj
165
+ - model.layers.32.mlp.down_proj
166
+ - model.layers.34.mlp.down_proj
167
+ - model.layers.33.mlp.down_proj
168
+ - model.layers.76.mlp.down_proj
169
+ - model.layers.72.mlp.down_proj
170
+ - model.layers.71.mlp.down_proj
171
+ - model.layers.58.mlp.down_proj
172
+ - model.layers.75.mlp.down_proj
173
+ - model.layers.29.mlp.down_proj
174
+ - model.layers.56.mlp.down_proj
175
+ - model.layers.26.mlp.down_proj
176
+ - model.layers.35.mlp.down_proj
177
+ - model.layers.28.mlp.down_proj
178
+ - model.layers.57.mlp.down_proj
179
+ - model.layers.77.mlp.down_proj
180
+ - model.layers.36.mlp.down_proj
181
+ - model.layers.27.mlp.down_proj
182
+ - model.layers.25.mlp.down_proj
183
+ - model.layers.78.mlp.down_proj
184
+ - model.layers.37.mlp.down_proj
185
+ - model.layers.73.mlp.down_proj
186
+ - model.layers.55.mlp.down_proj
187
+ - model.layers.54.mlp.down_proj
188
+ - model.layers.74.mlp.down_proj
189
+ - model.layers.24.mlp.down_proj
190
+ - model.layers.53.mlp.down_proj
191
+ # mlp.gate_proj layers
192
+ - model.layers.78.mlp.gate_proj
193
+ - model.layers.77.mlp.gate_proj
194
+ - model.layers.76.mlp.gate_proj
195
+ - model.layers.79.mlp.gate_proj
196
+ - model.layers.75.mlp.gate_proj
197
+ - model.layers.74.mlp.gate_proj
198
+ - model.layers.73.mlp.gate_proj
199
+ - model.layers.72.mlp.gate_proj
200
+ - model.layers.71.mlp.gate_proj
201
+ - model.layers.70.mlp.gate_proj
202
+ - model.layers.69.mlp.gate_proj
203
+ - model.layers.57.mlp.gate_proj
204
+ - model.layers.54.mlp.gate_proj
205
+ - model.layers.55.mlp.gate_proj
206
+ - model.layers.68.mlp.gate_proj
207
+ - model.layers.63.mlp.gate_proj
208
+ - model.layers.53.mlp.gate_proj
209
+ - model.layers.44.mlp.gate_proj
210
+ - model.layers.45.mlp.gate_proj
211
+ - model.layers.49.mlp.gate_proj
212
+ - model.layers.58.mlp.gate_proj
213
+ - model.layers.46.mlp.gate_proj
214
+ - model.layers.56.mlp.gate_proj
215
+ - model.layers.67.mlp.gate_proj
216
+ - model.layers.62.mlp.gate_proj
217
+ - model.layers.50.mlp.gate_proj
218
+ - model.layers.64.mlp.gate_proj
219
+ - model.layers.52.mlp.gate_proj
220
+ - model.layers.40.mlp.gate_proj
221
+ - model.layers.43.mlp.gate_proj
222
+ - model.layers.48.mlp.gate_proj
223
+ - model.layers.66.mlp.gate_proj
224
+ - model.layers.47.mlp.gate_proj
225
+ - model.layers.59.mlp.gate_proj
226
+ - model.layers.65.mlp.gate_proj
227
+ - model.layers.61.mlp.gate_proj
228
+ - model.layers.60.mlp.gate_proj
229
+ - model.layers.42.mlp.gate_proj
230
+ - model.layers.51.mlp.gate_proj
231
+ - model.layers.41.mlp.gate_proj
232
+ # mlp.up_proj layers
233
+ - model.layers.70.mlp.up_proj
234
+ - model.layers.69.mlp.up_proj
235
+ - model.layers.71.mlp.up_proj
236
+ - model.layers.68.mlp.up_proj
237
+ - model.layers.72.mlp.up_proj
238
+ - model.layers.67.mlp.up_proj
239
+ - model.layers.66.mlp.up_proj
240
+ - model.layers.73.mlp.up_proj
241
+ - model.layers.46.mlp.up_proj
242
+ - model.layers.63.mlp.up_proj
243
+ - model.layers.75.mlp.up_proj
244
+ - model.layers.76.mlp.up_proj
245
+ - model.layers.74.mlp.up_proj
246
+ - model.layers.45.mlp.up_proj
247
+ - model.layers.62.mlp.up_proj
248
+ - model.layers.64.mlp.up_proj
249
+ - model.layers.65.mlp.up_proj
250
+ - model.layers.44.mlp.up_proj
251
+ - model.layers.53.mlp.up_proj
252
+ - model.layers.47.mlp.up_proj
253
+ - model.layers.49.mlp.up_proj
254
+ - model.layers.48.mlp.up_proj
255
+ - model.layers.57.mlp.up_proj
256
+ - model.layers.43.mlp.up_proj
257
+ - model.layers.42.mlp.up_proj
258
+ - model.layers.56.mlp.up_proj
259
+ - model.layers.61.mlp.up_proj
260
+ - model.layers.54.mlp.up_proj
261
+ - model.layers.40.mlp.up_proj
262
+ - model.layers.55.mlp.up_proj
263
+ - model.layers.77.mlp.up_proj
264
+ - model.layers.60.mlp.up_proj
265
+ - model.layers.41.mlp.up_proj
266
+ - model.layers.35.mlp.up_proj
267
+ - model.layers.37.mlp.up_proj
268
+ - model.layers.58.mlp.up_proj
269
+ - model.layers.34.mlp.up_proj
270
+ - model.layers.38.mlp.up_proj
271
+ - model.layers.33.mlp.up_proj
272
+ - model.layers.39.mlp.up_proj
273
+ # self_attn.k_proj layers
274
+ - model.layers.36.self_attn.k_proj
275
+ - model.layers.79.self_attn.k_proj
276
+ - model.layers.35.self_attn.k_proj
277
+ - model.layers.34.self_attn.k_proj
278
+ - model.layers.37.self_attn.k_proj
279
+ - model.layers.33.self_attn.k_proj
280
+ - model.layers.38.self_attn.k_proj
281
+ - model.layers.39.self_attn.k_proj
282
+ - model.layers.74.self_attn.k_proj
283
+ - model.layers.77.self_attn.k_proj
284
+ - model.layers.41.self_attn.k_proj
285
+ - model.layers.69.self_attn.k_proj
286
+ - model.layers.32.self_attn.k_proj
287
+ - model.layers.78.self_attn.k_proj
288
+ - model.layers.30.self_attn.k_proj
289
+ - model.layers.70.self_attn.k_proj
290
+ - model.layers.25.self_attn.k_proj
291
+ - model.layers.42.self_attn.k_proj
292
+ - model.layers.29.self_attn.k_proj
293
+ - model.layers.31.self_attn.k_proj
294
+ - model.layers.68.self_attn.k_proj
295
+ - model.layers.66.self_attn.k_proj
296
+ - model.layers.22.self_attn.k_proj
297
+ - model.layers.65.self_attn.k_proj
298
+ - model.layers.44.self_attn.k_proj
299
+ - model.layers.40.self_attn.k_proj
300
+ - model.layers.63.self_attn.k_proj
301
+ - model.layers.23.self_attn.k_proj
302
+ - model.layers.28.self_attn.k_proj
303
+ - model.layers.24.self_attn.k_proj
304
+ - model.layers.26.self_attn.k_proj
305
+ - model.layers.67.self_attn.k_proj
306
+ - model.layers.75.self_attn.k_proj
307
+ - model.layers.27.self_attn.k_proj
308
+ - model.layers.57.self_attn.k_proj
309
+ - model.layers.64.self_attn.k_proj
310
+ - model.layers.71.self_attn.k_proj
311
+ - model.layers.61.self_attn.k_proj
312
+ - model.layers.72.self_attn.k_proj
313
+ - model.layers.73.self_attn.k_proj
314
+ # self_attn.o_proj layers
315
+ - model.layers.69.self_attn.o_proj
316
+ - model.layers.39.self_attn.o_proj
317
+ - model.layers.16.self_attn.o_proj
318
+ - model.layers.14.self_attn.o_proj
319
+ - model.layers.19.self_attn.o_proj
320
+ - model.layers.42.self_attn.o_proj
321
+ - model.layers.12.self_attn.o_proj
322
+ - model.layers.15.self_attn.o_proj
323
+ - model.layers.17.self_attn.o_proj
324
+ - model.layers.38.self_attn.o_proj
325
+ - model.layers.23.self_attn.o_proj
326
+ - model.layers.22.self_attn.o_proj
327
+ - model.layers.13.self_attn.o_proj
328
+ - model.layers.29.self_attn.o_proj
329
+ - model.layers.41.self_attn.o_proj
330
+ - model.layers.44.self_attn.o_proj
331
+ - model.layers.46.self_attn.o_proj
332
+ - model.layers.45.self_attn.o_proj
333
+ - model.layers.43.self_attn.o_proj
334
+ - model.layers.49.self_attn.o_proj
335
+ - model.layers.30.self_attn.o_proj
336
+ - model.layers.26.self_attn.o_proj
337
+ - model.layers.25.self_attn.o_proj
338
+ - model.layers.37.self_attn.o_proj
339
+ - model.layers.47.self_attn.o_proj
340
+ - model.layers.11.self_attn.o_proj
341
+ - model.layers.18.self_attn.o_proj
342
+ - model.layers.28.self_attn.o_proj
343
+ - model.layers.20.self_attn.o_proj
344
+ - model.layers.27.self_attn.o_proj
345
+ - model.layers.53.self_attn.o_proj
346
+ - model.layers.52.self_attn.o_proj
347
+ - model.layers.35.self_attn.o_proj
348
+ - model.layers.71.self_attn.o_proj
349
+ - model.layers.10.self_attn.o_proj
350
+ - model.layers.3.self_attn.o_proj
351
+ - model.layers.21.self_attn.o_proj
352
+ - model.layers.24.self_attn.o_proj
353
+ - model.layers.68.self_attn.o_proj
354
+ - model.layers.48.self_attn.o_proj
355
+ # self_attn.q_proj layers
356
+ - model.layers.1.self_attn.q_proj
357
+ - model.layers.2.self_attn.q_proj
358
+ - model.layers.3.self_attn.q_proj
359
+ - model.layers.0.self_attn.q_proj
360
+ - model.layers.5.self_attn.q_proj
361
+ - model.layers.4.self_attn.q_proj
362
+ - model.layers.6.self_attn.q_proj
363
+ - model.layers.8.self_attn.q_proj
364
+ - model.layers.7.self_attn.q_proj
365
+ - model.layers.9.self_attn.q_proj
366
+ - model.layers.10.self_attn.q_proj
367
+ - model.layers.68.self_attn.q_proj
368
+ - model.layers.25.self_attn.q_proj
369
+ - model.layers.12.self_attn.q_proj
370
+ - model.layers.54.self_attn.q_proj
371
+ - model.layers.55.self_attn.q_proj
372
+ - model.layers.61.self_attn.q_proj
373
+ - model.layers.18.self_attn.q_proj
374
+ - model.layers.49.self_attn.q_proj
375
+ - model.layers.66.self_attn.q_proj
376
+ - model.layers.72.self_attn.q_proj
377
+ - model.layers.11.self_attn.q_proj
378
+ - model.layers.52.self_attn.q_proj
379
+ - model.layers.64.self_attn.q_proj
380
+ - model.layers.15.self_attn.q_proj
381
+ - model.layers.60.self_attn.q_proj
382
+ - model.layers.50.self_attn.q_proj
383
+ - model.layers.59.self_attn.q_proj
384
+ - model.layers.53.self_attn.q_proj
385
+ - model.layers.48.self_attn.q_proj
386
+ - model.layers.57.self_attn.q_proj
387
+ - model.layers.70.self_attn.q_proj
388
+ - model.layers.17.self_attn.q_proj
389
+ - model.layers.67.self_attn.q_proj
390
+ - model.layers.71.self_attn.q_proj
391
+ - model.layers.62.self_attn.q_proj
392
+ - model.layers.51.self_attn.q_proj
393
+ - model.layers.19.self_attn.q_proj
394
+ - model.layers.58.self_attn.q_proj
395
+ - model.layers.13.self_attn.q_proj
396
+ # self_attn.v_proj layers
397
+ - model.layers.23.self_attn.v_proj
398
+ - model.layers.25.self_attn.v_proj
399
+ - model.layers.26.self_attn.v_proj
400
+ - model.layers.27.self_attn.v_proj
401
+ - model.layers.28.self_attn.v_proj
402
+ - model.layers.29.self_attn.v_proj
403
+ - model.layers.30.self_attn.v_proj
404
+ - model.layers.31.self_attn.v_proj
405
+ - model.layers.34.self_attn.v_proj
406
+ - model.layers.35.self_attn.v_proj
407
+ - model.layers.36.self_attn.v_proj
408
+ - model.layers.37.self_attn.v_proj
409
+ - model.layers.38.self_attn.v_proj
410
+ - model.layers.42.self_attn.v_proj
411
+ - model.layers.48.self_attn.v_proj
412
+ - model.layers.57.self_attn.v_proj
413
+ - model.layers.58.self_attn.v_proj
414
+ - model.layers.61.self_attn.v_proj
415
+ - model.layers.63.self_attn.v_proj
416
+ - model.layers.64.self_attn.v_proj
417
+ - model.layers.65.self_attn.v_proj
418
+ - model.layers.66.self_attn.v_proj
419
+ - model.layers.69.self_attn.v_proj
420
+ - model.layers.70.self_attn.v_proj
421
+ - model.layers.74.self_attn.v_proj
422
+ - model.layers.75.self_attn.v_proj
423
+ - model.layers.72.self_attn.v_proj
424
+ - model.layers.39.self_attn.v_proj
425
+ - model.layers.41.self_attn.v_proj
426
+ - model.layers.40.self_attn.v_proj
427
+ - model.layers.33.self_attn.v_proj
428
+ - model.layers.59.self_attn.v_proj
429
+ - model.layers.16.self_attn.v_proj
430
+ - model.layers.15.self_attn.v_proj
431
+ - model.layers.76.self_attn.v_proj
432
+ - model.layers.24.self_attn.v_proj
433
+ - model.layers.68.self_attn.v_proj
434
+ - model.layers.67.self_attn.v_proj
435
+ - model.layers.55.self_attn.v_proj
436
+ - model.layers.44.self_attn.v_proj
437
+
438
+
439
+
440
+ wandb_project: EVA-Qwen2.5-72B-SFFT-v0.2
441
+ wandb_entity:
442
+ wandb_watch:
443
+ wandb_name: Unit-02
444
+ wandb_log_model:
445
+
446
+ gradient_accumulation_steps: 8
447
+ micro_batch_size: 1
448
+ num_epochs: 3
449
+ optimizer: paged_ademamix_8bit
450
+ lr_scheduler: cosine
451
+ learning_rate: 0.00003
452
+ max_grad_norm: 1.5
453
+
454
+ train_on_inputs: false
455
+ group_by_length: false
456
+ bf16: auto
457
+ fp16:
458
+ tf32: false
459
+
460
+ gradient_checkpointing: "unsloth"
461
+ # gradient_checkpointing_kwargs:
462
+ # use_reentrant: true
463
+ early_stopping_patience:
464
+ resume_from_checkpoint: EVA-Qwen2.5-72B-SFFT-v0.2/checkpoint-128
465
+ local_rank:
466
+ logging_steps: 1
467
+ xformers_attention:
468
+ flash_attention: true
469
+
470
+ warmup_steps: 20
471
+ evals_per_epoch: 4
472
+ saves_per_epoch: 4
473
+ save_safetensors: true
474
+ save_total_limit: 1
475
+ hub_model_id:
476
+ hub_strategy:
477
+ debug:
478
+ deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json
479
+ weight_decay: 0.12
480
+ # fsdp:
481
+ # - full_shard
482
+ # - auto_wrap
483
+ # fsdp_config:
484
+ # fsdp_limit_all_gathers: true
485
+ # fsdp_sync_module_states: false
486
+ # fsdp_offload_params: true
487
+ # fsdp_cpu_ram_efficient_loading: true
488
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
489
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
490
+ # fsdp_activation_checkpointing: true
491
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
492
+ # fsdp_sharding_strategy: FULL_SHARD
493
+ # fsdp_forward_prefetch: false # Added
494
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
495
+ # fsdp_backward_prefetch_limit: 1 # Added
496
+ # fsdp_mixed_precision: BF16 # Added
497
+ ```
498
+
499
+ </details><br>
500
+
501
+ <h3><a href=https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard>Open LLM Leaderboard Evaluation Results</a></h3>
502
+
503
+ | Metric |Value|
504
+ |-------------------|----:|
505
+ |Avg. |43.54|
506
+ |IFEval (0-Shot) |68.79|
507
+ |BBH (3-Shot) |59.07|
508
+ |MATH Lvl 5 (4-Shot)|39.05|
509
+ |GPQA (0-shot) |21.14|
510
+ |MuSR (0-shot) |19.73|
511
+ |MMLU-PRO (5-shot) |53.48|
512
+