Safetensors
English
mixtral
jeiku commited on
Commit
b351190
·
verified ·
1 Parent(s): d5d29a9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +240 -0
README.md ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - jeiku/Writing
5
+ - FourOhFour/RP_Phase
6
+ - anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
7
+ language:
8
+ - en
9
+ base_model:
10
+ - IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
11
+ ---
12
+ ## Aura-MoE-2x4B-v2
13
+
14
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/LpCTIR45g099eXDIwYmKa.png)
15
+
16
+ ## Introduction
17
+
18
+ **Aura-MoE-2x4B-v2** is a state of the art dedicated roleplaying model designed to fulfill your every desire.
19
+
20
+ The finetunes used in this merge saw several hundreds of millions of tokens of instruction data. The merge was then healed on 150 million tokens of roleplaying data. A Kahneman-Tversky Optimization was applied to the healed model to give it a unique output style.
21
+
22
+ Developed by **Aura Industries**, with contributions from **Anthracite Org**
23
+
24
+ ## Model Details
25
+
26
+ - **Model Name**: Aura-MoE-2x4B-v2
27
+ - **Base Model**: [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml)
28
+ - **Model Type**: Chat Completions
29
+ - **Prompt Format**: ChatML
30
+ - **License**: Apache-2.0
31
+ - **Language**: English
32
+ - **Max Context**: 8,192+ tokens
33
+
34
+ ## License
35
+
36
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
37
+
38
+ ## Quantizations
39
+
40
+ Due to the abnormal nature of this model, only static GGUF quantization is available.
41
+
42
+ coming soon...
43
+
44
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
45
+
46
+ Coming soon...
47
+
48
+ | Metric |Value|
49
+ |-------------------|----:|
50
+ |Avg. | N/A|
51
+ |IFEval (0-Shot) | N/A|
52
+ |BBH (3-Shot) | N/A|
53
+ |MATH Lvl 5 (4-Shot)| N/A|
54
+ |GPQA (0-shot) | N/A|
55
+ |MuSR (0-shot) | N/A|
56
+ |MMLU-PRO (5-shot) | N/A|
57
+
58
+ ## Training Configuration
59
+
60
+ <details><summary>Click here for Mergekit and Axolotl configs</summary>
61
+
62
+ MoE Merge
63
+
64
+ ```yaml
65
+ base_model: FourOhFour/Zenith_4B
66
+ gate_mode: random
67
+ dtype: bfloat16
68
+ experts_per_token: 1
69
+ experts:
70
+ - source_model: FourOhFour/Luxe_4B
71
+ - source_model: FourOhFour/Zenith_4B
72
+ ```
73
+
74
+ SFT
75
+
76
+ ```yaml
77
+ base_model: jeiku/MoEv2
78
+ model_type: AutoModelForCausalLM
79
+ tokenizer_type: AutoTokenizer
80
+
81
+ load_in_8bit: false
82
+ load_in_4bit: false
83
+ strict: false
84
+
85
+ datasets:
86
+ - path: FourOhFour/RP_Phase
87
+ type: chat_template
88
+ chat_template: chatml
89
+ roles_to_train: ["gpt"]
90
+ field_messages: conversations
91
+ message_field_role: from
92
+ message_field_content: value
93
+ train_on_eos: turn
94
+ - path: jeiku/Writing
95
+ type: completion
96
+ field: text
97
+
98
+ chat_template: chatml
99
+
100
+ shuffle_merged_datasets: true
101
+ dataset_prepared_path:
102
+ val_set_size: 0.01
103
+ output_dir: ./output/out
104
+
105
+ hub_model_id: jeiku/Aura-MoEv2
106
+ hub_strategy: "all_checkpoints"
107
+ push_dataset_to_hub:
108
+ hf_use_auth_token: true
109
+
110
+ sequence_len: 8192
111
+ sample_packing: true
112
+ eval_sample_packing: false
113
+ pad_to_sequence_len:
114
+
115
+ wandb_project: Aura-MoEv2
116
+ wandb_entity:
117
+ wandb_watch:
118
+ wandb_name: Aura-MoEv2
119
+ wandb_log_model:
120
+
121
+ gradient_accumulation_steps: 16
122
+ micro_batch_size: 2
123
+ num_epochs: 2
124
+ optimizer: paged_adamw_8bit
125
+ lr_scheduler: cosine
126
+ learning_rate: 0.00005
127
+
128
+ train_on_inputs: false
129
+ group_by_length: false
130
+ bf16: auto
131
+ fp16:
132
+ tf32: false
133
+
134
+ gradient_checkpointing: true
135
+ early_stopping_patience:
136
+ resume_from_checkpoint:
137
+ local_rank:
138
+ logging_steps: 1
139
+ xformers_attention:
140
+ flash_attention: true
141
+
142
+ warmup_steps: 10
143
+ evals_per_epoch: 2
144
+ eval_table_size:
145
+ eval_max_new_tokens:
146
+ saves_per_epoch: 1
147
+ debug:
148
+ deepspeed:
149
+ weight_decay: 0.05
150
+ fsdp:
151
+ fsdp_config:
152
+ special_tokens:
153
+ pad_token: <|finetune_right_pad_id|>
154
+ ```
155
+
156
+ KTO
157
+
158
+ ```yaml
159
+ base_model: jeiku/Aura-MoEv2
160
+ model_type: AutoModelForCausalLM
161
+ tokenizer_type: AutoTokenizer
162
+
163
+ load_in_8bit: false
164
+ load_in_4bit: false
165
+ strict: false
166
+
167
+ hub_model_id: jeiku/moekto
168
+ hub_strategy: "all_checkpoints"
169
+ push_dataset_to_hub:
170
+ hf_use_auth_token: true
171
+
172
+ chat_template: chatml
173
+
174
+ rl: kto
175
+ rl_beta: 0.2
176
+ kto_desirable_weight: 0.2
177
+
178
+ datasets:
179
+ - path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
180
+ type: chatml.argilla
181
+
182
+ shuffle_merged_datasets: true
183
+ val_set_size: 0.0
184
+ output_dir: ./outputs/out
185
+
186
+ sequence_len: 8192
187
+ sample_packing: false
188
+ eval_sample_packing: false
189
+ pad_to_sequence_len: false
190
+
191
+ wandb_project: moekto
192
+ wandb_entity:
193
+ wandb_watch:
194
+ wandb_name: moekto
195
+ wandb_log_model:
196
+
197
+ gradient_accumulation_steps: 16
198
+ micro_batch_size: 2
199
+ num_epochs: 2
200
+ max_steps: 500
201
+
202
+ optimizer: adamw_8bit
203
+ lr_scheduler: cosine
204
+ learning_rate: 0.00001
205
+ weight_decay: 0.05
206
+
207
+ train_on_inputs: false
208
+ group_by_length: false
209
+ bf16: auto
210
+ fp16:
211
+ tf32: true
212
+
213
+ gradient_checkpointing: true
214
+ gradient_checkpointing_kwargs:
215
+ use_reentrant: true
216
+ remove_unused_columns: false
217
+ early_stopping_patience:
218
+ resume_from_checkpoint:
219
+ local_rank:
220
+ logging_steps: 1
221
+ xformers_attention:
222
+ flash_attention: true
223
+
224
+ warmup_steps: 10
225
+ evals_per_epoch: 2
226
+ eval_table_size:
227
+ eval_max_new_tokens:
228
+ saves_per_epoch: 1
229
+
230
+ debug:
231
+ deepspeed:
232
+ fsdp:
233
+ fsdp_config:
234
+ fsdp:
235
+ fsdp_config:
236
+
237
+ special_tokens:
238
+ pad_token: <|finetune_right_pad_id|>
239
+ ```
240
+ </details><br>