Safetensors
English
llama
jeiku commited on
Commit
314fccd
1 Parent(s): 5e72f71

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +405 -0
README.md ADDED
@@ -0,0 +1,405 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Mielikki/Erebus-87k
5
+ - FourOhFour/Instruct_Phase
6
+ - FourOhFour/RP_Phase
7
+ - anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
8
+ language:
9
+ - en
10
+ base_model:
11
+ - IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
12
+ ---
13
+ ## Aura-4B
14
+
15
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/jT4LeWC0ioarPieWtNZkE.png)
16
+
17
+ ## Introduction
18
+
19
+ **Aura-4B** is a state of the art dedicated roleplaying model designed to fulfill your every desire.
20
+
21
+ This finetune has seen several hundreds of millions of tokens of instruction and roleplaying data. A Kahneman-Tversky Optimization was applied to give this model a unique output style.
22
+
23
+ Developed by **Aura Industries**, with contributions from **Anthracite Org**
24
+
25
+ ## Model Details
26
+
27
+ - **Model Name**: Aura-4B
28
+ - **Base Model**: [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml)
29
+ - **Model Type**: Chat Completions
30
+ - **Prompt Format**: ChatML
31
+ - **License**: Apache-2.0
32
+ - **Language**: English
33
+ - **Max Context**: 8,192 tokens
34
+
35
+ ## License
36
+
37
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
38
+
39
+ ## Quantizations
40
+
41
+ Coming soon...
42
+
43
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
44
+
45
+ | Metric |Value|
46
+ |-------------------|----:|
47
+ |Avg. | N/A|
48
+ |IFEval (0-Shot) | N/A|
49
+ |BBH (3-Shot) | N/A|
50
+ |MATH Lvl 5 (4-Shot)| N/A|
51
+ |GPQA (0-shot) | N/A|
52
+ |MuSR (0-shot) | N/A|
53
+ |MMLU-PRO (5-shot) | N/A|
54
+
55
+ ## Training Configuration
56
+
57
+ <details><summary>Click here for Axolotl configs</summary>
58
+
59
+ Completion SFT
60
+
61
+ ```yaml
62
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
63
+ model_type: AutoModelForCausalLM
64
+ tokenizer_type: AutoTokenizer
65
+
66
+ load_in_8bit: false
67
+ load_in_4bit: false
68
+ strict: false
69
+
70
+ hub_model_id: jeiku/completion4B
71
+ hub_strategy: "all_checkpoints"
72
+ push_dataset_to_hub:
73
+ hf_use_auth_token: true
74
+
75
+ datasets:
76
+ - path: Mielikki/Erebus-87k
77
+ type: completion
78
+ field: body
79
+
80
+ shuffle_merged_datasets: true
81
+ val_set_size: 0.0025
82
+ output_dir: ./outputs/out
83
+
84
+ adapter:
85
+ lora_r:
86
+ lora_alpha:
87
+ lora_dropout:
88
+ lora_target_linear:
89
+
90
+ sequence_len: 8192
91
+ sample_packing: true
92
+ eval_sample_packing: false
93
+ pad_to_sequence_len: true
94
+
95
+ plugins:
96
+ - axolotl.integrations.liger.LigerPlugin
97
+ liger_rope: true
98
+ liger_rms_norm: true
99
+ liger_swiglu: true
100
+ liger_fused_linear_cross_entropy: true
101
+
102
+ wandb_project: EXP4B
103
+ wandb_entity:
104
+ wandb_watch:
105
+ wandb_name: EXP4B
106
+ wandb_log_model:
107
+
108
+ gradient_accumulation_steps: 12
109
+ micro_batch_size: 3
110
+ num_epochs: 1
111
+ optimizer: adamw_bnb_8bit
112
+ lr_scheduler: cosine
113
+ learning_rate: 0.00001
114
+ weight_decay: 0.05
115
+
116
+ train_on_inputs: false
117
+ group_by_length: false
118
+ bf16: auto
119
+ fp16:
120
+ tf32: true
121
+
122
+ gradient_checkpointing: true
123
+ early_stopping_patience:
124
+ resume_from_checkpoint:
125
+ local_rank:
126
+ logging_steps: 1
127
+ xformers_attention:
128
+ flash_attention: true
129
+
130
+ warmup_ratio: 0.1
131
+ evals_per_epoch: 4
132
+ eval_table_size:
133
+ eval_max_new_tokens: 128
134
+ saves_per_epoch: 1
135
+
136
+ debug:
137
+ deepspeed: deepspeed_configs/zero3_bf16.json
138
+ fsdp:
139
+ fsdp_config:
140
+
141
+ special_tokens:
142
+ pad_token: <|finetune_right_pad_id|>
143
+ ```
144
+
145
+ Instruct SFT
146
+
147
+ ```yaml
148
+ base_model: jeiku/completion4B
149
+ model_type: AutoModelForCausalLM
150
+ tokenizer_type: AutoTokenizer
151
+
152
+ load_in_8bit: false
153
+ load_in_4bit: false
154
+ strict: false
155
+
156
+ hub_model_id: jeiku/instructered4B
157
+ hub_strategy: "all_checkpoints"
158
+ push_dataset_to_hub:
159
+ hf_use_auth_token: true
160
+
161
+ datasets:
162
+ - path: FourOhFour/Instruct_Phase
163
+ type: sharegpt
164
+ conversation: chatml
165
+
166
+ chat_template: chatml
167
+
168
+ shuffle_merged_datasets: true
169
+ val_set_size: 0.0025
170
+ output_dir: ./outputs/out
171
+
172
+ adapter:
173
+ lora_r:
174
+ lora_alpha:
175
+ lora_dropout:
176
+ lora_target_linear:
177
+
178
+ sequence_len: 8192
179
+ sample_packing: true
180
+ eval_sample_packing: false
181
+ pad_to_sequence_len: true
182
+
183
+ plugins:
184
+ - axolotl.integrations.liger.LigerPlugin
185
+ liger_rope: true
186
+ liger_rms_norm: true
187
+ liger_swiglu: true
188
+ liger_fused_linear_cross_entropy: true
189
+
190
+ wandb_project: EXP4B
191
+ wandb_entity:
192
+ wandb_watch:
193
+ wandb_name: EXP4B
194
+ wandb_log_model:
195
+
196
+ gradient_accumulation_steps: 12
197
+ micro_batch_size: 3
198
+ num_epochs: 2
199
+ optimizer: adamw_bnb_8bit
200
+ lr_scheduler: cosine
201
+ learning_rate: 0.00001
202
+ weight_decay: 0.05
203
+
204
+ train_on_inputs: false
205
+ group_by_length: false
206
+ bf16: auto
207
+ fp16:
208
+ tf32: true
209
+
210
+ gradient_checkpointing: true
211
+ early_stopping_patience:
212
+ resume_from_checkpoint:
213
+ local_rank:
214
+ logging_steps: 1
215
+ xformers_attention:
216
+ flash_attention: true
217
+
218
+ warmup_ratio: 0.1
219
+ evals_per_epoch: 4
220
+ eval_table_size:
221
+ eval_max_new_tokens: 128
222
+ saves_per_epoch: 2
223
+
224
+ debug:
225
+ deepspeed: deepspeed_configs/zero3_bf16.json
226
+ fsdp:
227
+ fsdp_config:
228
+
229
+ special_tokens:
230
+ pad_token: <|finetune_right_pad_id|>
231
+ ```
232
+
233
+ Roleplaying SFT
234
+
235
+ ```yaml
236
+ base_model: jeiku/instructered4B
237
+ model_type: AutoModelForCausalLM
238
+ tokenizer_type: AutoTokenizer
239
+
240
+ load_in_8bit: false
241
+ load_in_4bit: false
242
+ strict: false
243
+
244
+ hub_model_id: jeiku/TheBest4B
245
+ hub_strategy: "all_checkpoints"
246
+ push_dataset_to_hub:
247
+ hf_use_auth_token: true
248
+
249
+ datasets:
250
+ - path: FourOhFour/RP_Phase
251
+ type: sharegpt
252
+ conversation: chatml
253
+
254
+ chat_template: chatml
255
+
256
+ shuffle_merged_datasets: true
257
+ val_set_size: 0.0025
258
+ output_dir: ./outputs/out
259
+
260
+ adapter:
261
+ lora_r:
262
+ lora_alpha:
263
+ lora_dropout:
264
+ lora_target_linear:
265
+
266
+ sequence_len: 8192
267
+ sample_packing: true
268
+ eval_sample_packing: false
269
+ pad_to_sequence_len: true
270
+
271
+ plugins:
272
+ - axolotl.integrations.liger.LigerPlugin
273
+ liger_rope: true
274
+ liger_rms_norm: true
275
+ liger_swiglu: true
276
+ liger_fused_linear_cross_entropy: true
277
+
278
+ wandb_project: EXP4B
279
+ wandb_entity:
280
+ wandb_watch:
281
+ wandb_name: EXP4B
282
+ wandb_log_model:
283
+
284
+ gradient_accumulation_steps: 12
285
+ micro_batch_size: 3
286
+ num_epochs: 2
287
+ optimizer: adamw_bnb_8bit
288
+ lr_scheduler: cosine
289
+ learning_rate: 0.00001
290
+ weight_decay: 0.05
291
+
292
+ train_on_inputs: false
293
+ group_by_length: false
294
+ bf16: auto
295
+ fp16:
296
+ tf32: true
297
+
298
+ gradient_checkpointing: true
299
+ early_stopping_patience:
300
+ resume_from_checkpoint:
301
+ local_rank:
302
+ logging_steps: 1
303
+ xformers_attention:
304
+ flash_attention: true
305
+
306
+ warmup_ratio: 0.1
307
+ evals_per_epoch: 4
308
+ eval_table_size:
309
+ eval_max_new_tokens: 128
310
+ saves_per_epoch: 2
311
+
312
+ debug:
313
+ deepspeed: deepspeed_configs/zero3_bf16.json
314
+ fsdp:
315
+ fsdp_config:
316
+
317
+ special_tokens:
318
+ pad_token: <|finetune_right_pad_id|>
319
+ ```
320
+
321
+ KTO
322
+
323
+ ```yaml
324
+ base_model: FourOhFour/Crispy_Crab_4B
325
+ model_type: AutoModelForCausalLM
326
+ tokenizer_type: AutoTokenizer
327
+
328
+ load_in_8bit: false
329
+ load_in_4bit: false
330
+ strict: false
331
+
332
+ hub_model_id: jeiku/aura4bkto
333
+ hub_strategy: "all_checkpoints"
334
+ push_dataset_to_hub:
335
+ hf_use_auth_token: true
336
+
337
+ chat_template: chatml
338
+
339
+ rl: kto
340
+ rl_beta: 0.2
341
+ kto_desirable_weight: 0.2
342
+
343
+ datasets:
344
+ - path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
345
+ type: chatml.argilla
346
+
347
+ shuffle_merged_datasets: true
348
+ val_set_size: 0.0
349
+ output_dir: ./outputs/out
350
+
351
+ sequence_len: 8192
352
+ sample_packing: false
353
+ eval_sample_packing: false
354
+ pad_to_sequence_len: false
355
+
356
+ wandb_project: Aura-4B
357
+ wandb_entity:
358
+ wandb_watch:
359
+ wandb_name: Aura-4B
360
+ wandb_log_model:
361
+
362
+ gradient_accumulation_steps: 16
363
+ micro_batch_size: 2
364
+ num_epochs: 2
365
+ max_steps: 500
366
+
367
+ optimizer: adamw_8bit
368
+ lr_scheduler: cosine
369
+ learning_rate: 0.0001
370
+ weight_decay: 0.05
371
+
372
+ train_on_inputs: false
373
+ group_by_length: false
374
+ bf16: auto
375
+ fp16:
376
+ tf32: true
377
+
378
+ gradient_checkpointing: true
379
+ gradient_checkpointing_kwargs:
380
+ use_reentrant: true
381
+ remove_unused_columns: false
382
+ early_stopping_patience:
383
+ resume_from_checkpoint:
384
+ local_rank:
385
+ logging_steps: 1
386
+ xformers_attention:
387
+ flash_attention: true
388
+
389
+ warmup_steps: 10
390
+ evals_per_epoch: 2
391
+ eval_table_size:
392
+ eval_max_new_tokens:
393
+ saves_per_epoch: 1
394
+
395
+ debug:
396
+ deepspeed:
397
+ fsdp:
398
+ fsdp_config:
399
+ fsdp:
400
+ fsdp_config:
401
+
402
+ special_tokens:
403
+ pad_token: <|finetune_right_pad_id|>
404
+ ```
405
+ </details><br>