English
Files changed (1) hide show
  1. README.md +292 -0
README.md ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ datasets:
6
+ - FourOhFour/RP_Phase
7
+ - Nitral-AI/Cybersecurity-ShareGPT
8
+ - Nitral-AI/Medical_Instruct-ShareGPT
9
+ - Nitral-AI/Olympiad_Math-ShareGPT
10
+ - NewEden/Claude-Instruct-5K
11
+ - lodrick-the-lafted/kalo-opus-instruct-3k-filtered
12
+ - Nitral-AI/Creative_Writing-ShareGPT
13
+ - jeiku/Writing
14
+ - anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
15
+ base_model:
16
+ - arcee-ai/Llama-3.1-SuperNova-Lite
17
+ ---
18
+ ---
19
+ ### These are EXL2 quants for Aura-9B, Measurement file in the main branch, Check revisions for different BPW
20
+ ---
21
+ ## Aura-8B
22
+
23
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/9y03nVWVnBYU1tHkLwCJy.png)
24
+
25
+ ## Introduction
26
+
27
+ **Aura-8B** is a state of the art dedicated roleplaying model designed to fulfill your every desire.
28
+
29
+ This finetune has seen several hundreds of millions of tokens of instruction and roleplaying data. A Kahneman-Tversky Optimization was applied as a Low Rank Adapter to give this model a unique output style.
30
+
31
+ Developed by **Aura Industries**, with contributions from **Anthracite Org**
32
+
33
+ ## Model Details
34
+
35
+ - **Model Name**: Aura-8B
36
+ - **Base Model**: [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
37
+ - **Model Type**: Chat Completions
38
+ - **Prompt Format**: Llama 3
39
+ - **License**: Apache-2.0
40
+ - **Language**: English
41
+ - **Max Context**: 8,192+ tokens
42
+
43
+ ## License
44
+
45
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
46
+
47
+ ## Quantizations
48
+
49
+ [Static GGUF](https://huggingface.co/mradermacher/Aura-8B-GGUF)
50
+
51
+ [Imatrix GGUF](https://huggingface.co/mradermacher/Aura-8B-i1-GGUF)
52
+
53
+ [EXL2](https://huggingface.co/NewEden/Aura-8B-EXL2)
54
+
55
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
56
+
57
+ | Metric |Value|
58
+ |-------------------|----:|
59
+ |Avg. |27.34|
60
+ |IFEval (0-Shot) |72.05|
61
+ |BBH (3-Shot) |30.98|
62
+ |MATH Lvl 5 (4-Shot)|15.03|
63
+ |GPQA (0-shot) | 4.81|
64
+ |MuSR (0-shot) | 9.22|
65
+ |MMLU-PRO (5-shot) |31.93|
66
+
67
+ ## Training Configuration
68
+
69
+ <details><summary>Click here for Axolotl configs</summary>
70
+
71
+ SFT
72
+ ```yaml
73
+ base_model: arcee-ai/Llama-3.1-SuperNova-Lite
74
+ model_type: AutoModelForCausalLM
75
+ tokenizer_type: AutoTokenizer
76
+
77
+ load_in_8bit: false
78
+ load_in_4bit: false
79
+ strict: false
80
+
81
+ datasets:
82
+ - path: FourOhFour/RP_Phase
83
+ type: chat_template
84
+ chat_template: llama3
85
+ roles_to_train: ["gpt"]
86
+ field_messages: conversations
87
+ message_field_role: from
88
+ message_field_content: value
89
+ train_on_eos: turn
90
+ - path: Nitral-AI/Cybersecurity-ShareGPT
91
+ type: chat_template
92
+ chat_template: llama3
93
+ roles_to_train: ["gpt"]
94
+ field_messages: conversations
95
+ message_field_role: from
96
+ message_field_content: value
97
+ train_on_eos: turn
98
+ - path: Nitral-AI/Medical_Instruct-ShareGPT
99
+ type: chat_template
100
+ chat_template: llama3
101
+ roles_to_train: ["gpt"]
102
+ field_messages: conversations
103
+ message_field_role: from
104
+ message_field_content: value
105
+ train_on_eos: turn
106
+ - path: Nitral-AI/Olympiad_Math-ShareGPT
107
+ type: chat_template
108
+ chat_template: llama3
109
+ roles_to_train: ["gpt"]
110
+ field_messages: conversations
111
+ message_field_role: from
112
+ message_field_content: value
113
+ train_on_eos: turn
114
+ - path: NewEden/Claude-Instruct-5k
115
+ type: chat_template
116
+ chat_template: llama3
117
+ roles_to_train: ["gpt"]
118
+ field_messages: conversations
119
+ message_field_role: from
120
+ message_field_content: value
121
+ train_on_eos: turn
122
+ - path: lodrick-the-lafted/kalo-opus-instruct-3k-filtered
123
+ type: chat_template
124
+ chat_template: llama3
125
+ roles_to_train: ["gpt"]
126
+ field_messages: conversations
127
+ message_field_role: from
128
+ message_field_content: value
129
+ train_on_eos: turn
130
+ - path: Nitral-AI/Creative_Writing-ShareGPT
131
+ type: chat_template
132
+ chat_template: llama3
133
+ roles_to_train: ["gpt"]
134
+ field_messages: conversations
135
+ message_field_role: from
136
+ message_field_content: value
137
+ train_on_eos: turn
138
+ - path: jeiku/Writing
139
+ type: completion
140
+ field: text
141
+
142
+ shuffle_merged_datasets: true
143
+ dataset_prepared_path:
144
+ val_set_size: 0.01
145
+ output_dir: ./output/out
146
+
147
+ hub_model_id: jeiku/Aura-8B
148
+ hub_strategy: "all_checkpoints"
149
+ push_dataset_to_hub:
150
+ hf_use_auth_token: true
151
+
152
+ sequence_len: 8192
153
+ sample_packing: true
154
+ eval_sample_packing: false
155
+ pad_to_sequence_len:
156
+
157
+ wandb_project: Aura-8B
158
+ wandb_entity:
159
+ wandb_watch:
160
+ wandb_name: Aura-8B
161
+ wandb_log_model:
162
+
163
+ gradient_accumulation_steps: 16
164
+ micro_batch_size: 2
165
+ num_epochs: 2
166
+ optimizer: paged_adamw_8bit
167
+ lr_scheduler: cosine
168
+ learning_rate: 1e-5
169
+
170
+ train_on_inputs: false
171
+ group_by_length: false
172
+ bf16: auto
173
+ fp16:
174
+ tf32: false
175
+
176
+ gradient_checkpointing: true
177
+ early_stopping_patience:
178
+ resume_from_checkpoint:
179
+ local_rank:
180
+ logging_steps: 1
181
+ xformers_attention:
182
+ flash_attention: true
183
+
184
+ warmup_ratio: 0.1
185
+ evals_per_epoch: 2
186
+ eval_table_size:
187
+ eval_max_new_tokens:
188
+ saves_per_epoch: 1
189
+ debug:
190
+ deepspeed:
191
+ weight_decay: 0.05
192
+ fsdp:
193
+ fsdp_config:
194
+ special_tokens:
195
+ pad_token: <|finetune_right_pad_id|>
196
+ eos_token: <|eot_id|>
197
+ ```
198
+
199
+ KTO
200
+ ```yaml
201
+ base_model: jeiku/Aura-8B
202
+ model_type: AutoModelForCausalLM
203
+ tokenizer_type: AutoTokenizer
204
+
205
+ load_in_8bit: false
206
+ load_in_4bit: false
207
+ strict: false
208
+
209
+ hub_model_id: jeiku/aurakto
210
+ hub_strategy: "all_checkpoints"
211
+ push_dataset_to_hub:
212
+ hf_use_auth_token: true
213
+
214
+ chat_template: llama3
215
+
216
+ rl: kto
217
+ rl_beta: 0.2
218
+ kto_desirable_weight: 0.2
219
+
220
+ datasets:
221
+ - path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
222
+ type: llama3.argilla
223
+
224
+ shuffle_merged_datasets: true
225
+ val_set_size: 0.0
226
+ output_dir: ./outputs/out
227
+
228
+ adapter: lora
229
+ lora_model_dir:
230
+
231
+ lora_r: 32
232
+ lora_alpha: 64
233
+ lora_dropout: 0.05
234
+ lora_target_linear: true
235
+ lora_fan_in_fan_out:
236
+
237
+ sequence_len: 8192
238
+ sample_packing: false
239
+ eval_sample_packing: false
240
+ pad_to_sequence_len: false
241
+
242
+ wandb_project: Aura-8B
243
+ wandb_entity:
244
+ wandb_watch:
245
+ wandb_name: Aura-8B
246
+ wandb_log_model:
247
+
248
+ gradient_accumulation_steps: 16
249
+ micro_batch_size: 2
250
+ num_epochs: 2
251
+ max_steps: 500
252
+
253
+ optimizer: adamw_8bit
254
+ lr_scheduler: cosine
255
+ learning_rate: 0.0001
256
+ weight_decay: 0.05
257
+
258
+ train_on_inputs: false
259
+ group_by_length: false
260
+ bf16: auto
261
+ fp16:
262
+ tf32: true
263
+
264
+ gradient_checkpointing: true
265
+ gradient_checkpointing_kwargs:
266
+ use_reentrant: true
267
+ remove_unused_columns: false
268
+ early_stopping_patience:
269
+ resume_from_checkpoint:
270
+ local_rank:
271
+ logging_steps: 1
272
+ xformers_attention:
273
+ flash_attention: true
274
+
275
+ warmup_steps: 10
276
+ evals_per_epoch: 2
277
+ eval_table_size:
278
+ eval_max_new_tokens:
279
+ saves_per_epoch: 1
280
+
281
+ debug:
282
+ deepspeed:
283
+ fsdp:
284
+ fsdp_config:
285
+ fsdp:
286
+ fsdp_config:
287
+
288
+ special_tokens:
289
+ pad_token: <|finetune_right_pad_id|>
290
+ eos_token: <|eot_id|>
291
+ ```
292
+ </details><br>