aashish1904 commited on
Commit
cc8aa07
·
verified ·
1 Parent(s): f176988

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +379 -0
README.md ADDED
@@ -0,0 +1,379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: llama3.1
5
+ base_model: meta-llama/Meta-Llama-3.1-8B
6
+ tags:
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: workspace/axolotl/dolphin-2.9.4-llama3.1-8b
10
+ results: []
11
+
12
+ ---
13
+
14
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
15
+
16
+ # QuantFactory/dolphin-2.9.4-llama3.1-8b-GGUF
17
+ This is quantized version of [cognitivecomputations/dolphin-2.9.4-llama3.1-8b](https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b) created using llama.cpp
18
+
19
+ # Original Model Card
20
+
21
+
22
+ # warning - it's not working yet, recommend hold off on downloading
23
+
24
+ <details><summary>Evals</summary>
25
+
26
+ ```
27
+ hf (pretrained=/workspace/axolotl/dolphin-2.9.4-llama3.1-8b-hf,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (4)
28
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
29
+ |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
30
+ |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041|
31
+ | | |none | 0|acc_norm |↑ |0.4513|± |0.0053|
32
+ | | |none | 0|exact_match |↑ |0.0982|± |0.0079|
33
+ | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
34
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
35
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
36
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
37
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061|
38
+ | - leaderboard_bbh_boolean_expressions | 0|none | 3|acc_norm |↑ |0.8000|± |0.0253|
39
+ | - leaderboard_bbh_causal_judgement | 0|none | 3|acc_norm |↑ |0.5615|± |0.0364|
40
+ | - leaderboard_bbh_date_understanding | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315|
41
+ | - leaderboard_bbh_disambiguation_qa | 0|none | 3|acc_norm |↑ |0.6640|± |0.0299|
42
+ | - leaderboard_bbh_formal_fallacies | 0|none | 3|acc_norm |↑ |0.5600|± |0.0315|
43
+ | - leaderboard_bbh_geometric_shapes | 0|none | 3|acc_norm |↑ |0.3640|± |0.0305|
44
+ | - leaderboard_bbh_hyperbaton | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
45
+ | - leaderboard_bbh_logical_deduction_five_objects | 0|none | 3|acc_norm |↑ |0.4600|± |0.0316|
46
+ | - leaderboard_bbh_logical_deduction_seven_objects | 0|none | 3|acc_norm |↑ |0.4360|± |0.0314|
47
+ | - leaderboard_bbh_logical_deduction_three_objects | 0|none | 3|acc_norm |↑ |0.6160|± |0.0308|
48
+ | - leaderboard_bbh_movie_recommendation | 0|none | 3|acc_norm |↑ |0.7880|± |0.0259|
49
+ | - leaderboard_bbh_navigate | 0|none | 3|acc_norm |↑ |0.5200|± |0.0317|
50
+ | - leaderboard_bbh_object_counting | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315|
51
+ | - leaderboard_bbh_penguins_in_a_table | 0|none | 3|acc_norm |↑ |0.5205|± |0.0415|
52
+ | - leaderboard_bbh_reasoning_about_colored_objects | 0|none | 3|acc_norm |↑ |0.5120|± |0.0317|
53
+ | - leaderboard_bbh_ruin_names | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
54
+ | - leaderboard_bbh_salient_translation_error_detection | 0|none | 3|acc_norm |↑ |0.4320|± |0.0314|
55
+ | - leaderboard_bbh_snarks | 0|none | 3|acc_norm |↑ |0.5843|± |0.0370|
56
+ | - leaderboard_bbh_sports_understanding | 0|none | 3|acc_norm |↑ |0.7040|± |0.0289|
57
+ | - leaderboard_bbh_temporal_sequences | 0|none | 3|acc_norm |↑ |0.1440|± |0.0222|
58
+ | - leaderboard_bbh_tracking_shuffled_objects_five_objects | 0|none | 3|acc_norm |↑ |0.1560|± |0.0230|
59
+ | - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 0|none | 3|acc_norm |↑ |0.1320|± |0.0215|
60
+ | - leaderboard_bbh_tracking_shuffled_objects_three_objects| 0|none | 3|acc_norm |↑ |0.2840|± |0.0286|
61
+ | - leaderboard_bbh_web_of_lies | 0|none | 3|acc_norm |↑ |0.4840|± |0.0317|
62
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132|
63
+ | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.2980|± |0.0326|
64
+ | - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.2839|± |0.0193|
65
+ | - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.2946|± |0.0216|
66
+ | - leaderboard_ifeval | 2|none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
67
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
68
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
69
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
70
+ | - leaderboard_math_algebra_hard | 1|none | 4|exact_match |↑ |0.1596|± |0.0209|
71
+ | - leaderboard_math_counting_and_prob_hard | 1|none | 4|exact_match |↑ |0.0488|± |0.0195|
72
+ | - leaderboard_math_geometry_hard | 1|none | 4|exact_match |↑ |0.0530|± |0.0196|
73
+ | - leaderboard_math_hard |N/A |none | 4|exact_match |↑ |0.0982|± |0.0079|
74
+ | - leaderboard_math_intermediate_algebra_hard | 1|none | 4|exact_match |↑ |0.0143|± |0.0071|
75
+ | - leaderboard_math_num_theory_hard | 1|none | 4|exact_match |↑ |0.0455|± |0.0168|
76
+ | - leaderboard_math_prealgebra_hard | 1|none | 4|exact_match |↑ |0.2591|± |0.0316|
77
+ | - leaderboard_math_precalculus_hard | 1|none | 4|exact_match |↑ |0.0519|± |0.0192|
78
+ | - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.2926|± |0.0041|
79
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173|
80
+ | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.5280|± |0.0316|
81
+ | - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.3594|± |0.0300|
82
+ | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.2720|± |0.0282|
83
+
84
+ | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
85
+ |------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
86
+ |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041|
87
+ | | |none | 0|acc_norm |↑ |0.4513|± |0.0053|
88
+ | | |none | 0|exact_match |↑ |0.0982|± |0.0079|
89
+ | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
90
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
91
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
92
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
93
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061|
94
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132|
95
+ | - leaderboard_math_hard|N/A |none | 4|exact_match |↑ |0.0982|± |0.0079|
96
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173|
97
+ ```
98
+
99
+ </details>
100
+
101
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
102
+ <details><summary>See axolotl config</summary>
103
+
104
+ axolotl version: `0.4.1`
105
+ ```yaml
106
+ base_model: meta-llama/Meta-Llama-3.1-8B
107
+ model_type: LlamaForCausalLM
108
+ tokenizer_type: AutoTokenizer
109
+
110
+ load_in_8bit: false
111
+ # load_in_4bit: true
112
+ strict: false
113
+
114
+ datasets:
115
+ - path: /workspace/datasets/dolphin-2.9.4/dolphin201-sharegpt2.jsonl
116
+ type: sharegpt
117
+ conversation: chatml
118
+
119
+ chat_template: chatml
120
+ # adapter: qlora
121
+ # lora_r: 128
122
+ # lora_alpha: 16
123
+ # lora_modules_to_save: [embed_tokens, lm_head]
124
+ # lora_dropout: 0.05
125
+ # lora_target_linear: true
126
+
127
+ unfrozen_parameters:
128
+ - input_layernorm
129
+ - model.norm
130
+ - post_attention_layernorm
131
+ - self_attn.rotary_emb
132
+ - ^lm_head.weight$
133
+ - ^model.embed_tokens.weight$
134
+ # mlp.down_proj layers
135
+ - model.layers.1.mlp.down_proj
136
+ - model.layers.0.mlp.down_proj
137
+ - model.layers.30.mlp.down_proj
138
+ - model.layers.2.mlp.down_proj
139
+ - model.layers.21.mlp.down_proj
140
+ - model.layers.22.mlp.down_proj
141
+ - model.layers.29.mlp.down_proj
142
+ - model.layers.5.mlp.down_proj
143
+ - model.layers.4.mlp.down_proj
144
+ - model.layers.20.mlp.down_proj
145
+ - model.layers.23.mlp.down_proj
146
+ - model.layers.19.mlp.down_proj
147
+ - model.layers.3.mlp.down_proj
148
+ - model.layers.17.mlp.down_proj
149
+ - model.layers.6.mlp.down_proj
150
+ - model.layers.31.mlp.down_proj
151
+ # mlp.up_proj layers
152
+ - model.layers.4.mlp.up_proj
153
+ - model.layers.3.mlp.up_proj
154
+ - model.layers.0.mlp.up_proj
155
+ - model.layers.5.mlp.up_proj
156
+ - model.layers.7.mlp.up_proj
157
+ - model.layers.6.mlp.up_proj
158
+ - model.layers.2.mlp.up_proj
159
+ - model.layers.1.mlp.up_proj
160
+ - model.layers.8.mlp.up_proj
161
+ - model.layers.12.mlp.up_proj
162
+ - model.layers.14.mlp.up_proj
163
+ - model.layers.9.mlp.up_proj
164
+ - model.layers.15.mlp.up_proj
165
+ - model.layers.17.mlp.up_proj
166
+ - model.layers.13.mlp.up_proj
167
+ - model.layers.19.mlp.up_proj
168
+ # self_attn.k_proj layers
169
+ - model.layers.29.self_attn.k_proj
170
+ - model.layers.25.self_attn.k_proj
171
+ - model.layers.23.self_attn.k_proj
172
+ - model.layers.28.self_attn.k_proj
173
+ - model.layers.21.self_attn.k_proj
174
+ - model.layers.19.self_attn.k_proj
175
+ - model.layers.22.self_attn.k_proj
176
+ - model.layers.20.self_attn.k_proj
177
+ - model.layers.24.self_attn.k_proj
178
+ - model.layers.31.self_attn.k_proj
179
+ - model.layers.27.self_attn.k_proj
180
+ - model.layers.26.self_attn.k_proj
181
+ - model.layers.17.self_attn.k_proj
182
+ - model.layers.11.self_attn.k_proj
183
+ - model.layers.18.self_attn.k_proj
184
+ - model.layers.14.self_attn.k_proj
185
+ # self_attn.o_proj layers
186
+ - model.layers.14.self_attn.o_proj
187
+ - model.layers.7.self_attn.o_proj
188
+ - model.layers.5.self_attn.o_proj
189
+ - model.layers.11.self_attn.o_proj
190
+ - model.layers.6.self_attn.o_proj
191
+ - model.layers.24.self_attn.o_proj
192
+ - model.layers.9.self_attn.o_proj
193
+ - model.layers.13.self_attn.o_proj
194
+ - model.layers.10.self_attn.o_proj
195
+ - model.layers.12.self_attn.o_proj
196
+ - model.layers.8.self_attn.o_proj
197
+ - model.layers.25.self_attn.o_proj
198
+ - model.layers.21.self_attn.o_proj
199
+ - model.layers.23.self_attn.o_proj
200
+ - model.layers.15.self_attn.o_proj
201
+ - model.layers.16.self_attn.o_proj
202
+ # self_attn.q_proj layers
203
+ - model.layers.8.self_attn.q_proj
204
+ - model.layers.13.self_attn.q_proj
205
+ - model.layers.9.self_attn.q_proj
206
+ - model.layers.14.self_attn.q_proj
207
+ - model.layers.10.self_attn.q_proj
208
+ - model.layers.11.self_attn.q_proj
209
+ - model.layers.0.self_attn.q_proj
210
+ - model.layers.15.self_attn.q_proj
211
+ - model.layers.1.self_attn.q_proj
212
+ - model.layers.6.self_attn.q_proj
213
+ - model.layers.5.self_attn.q_proj
214
+ - model.layers.7.self_attn.q_proj
215
+ - model.layers.12.self_attn.q_proj
216
+ - model.layers.16.self_attn.q_proj
217
+ - model.layers.17.self_attn.q_proj
218
+ - model.layers.26.self_attn.q_proj
219
+ # self_attn.v_proj layers
220
+ - model.layers.26.self_attn.v_proj
221
+ - model.layers.17.self_attn.v_proj
222
+ - model.layers.3.self_attn.v_proj
223
+ - model.layers.28.self_attn.v_proj
224
+ - model.layers.29.self_attn.v_proj
225
+ - model.layers.21.self_attn.v_proj
226
+ - model.layers.15.self_attn.v_proj
227
+ - model.layers.16.self_attn.v_proj
228
+ - model.layers.20.self_attn.v_proj
229
+ - model.layers.25.self_attn.v_proj
230
+ - model.layers.6.self_attn.v_proj
231
+ - model.layers.23.self_attn.v_proj
232
+ - model.layers.4.self_attn.v_proj
233
+ - model.layers.1.self_attn.v_proj
234
+ - model.layers.22.self_attn.v_proj
235
+ - model.layers.14.self_attn.v_proj
236
+ # mlp.gate_proj layers
237
+ - model.layers.1.mlp.gate_proj
238
+ - model.layers.2.mlp.gate_proj
239
+ - model.layers.3.mlp.gate_proj
240
+ - model.layers.4.mlp.gate_proj
241
+ - model.layers.0.mlp.gate_proj
242
+ - model.layers.25.mlp.gate_proj
243
+ - model.layers.26.mlp.gate_proj
244
+ - model.layers.5.mlp.gate_proj
245
+ - model.layers.24.mlp.gate_proj
246
+ - model.layers.28.mlp.gate_proj
247
+ - model.layers.23.mlp.gate_proj
248
+ - model.layers.27.mlp.gate_proj
249
+ - model.layers.21.mlp.gate_proj
250
+ - model.layers.22.mlp.gate_proj
251
+ - model.layers.29.mlp.gate_proj
252
+ - model.layers.20.mlp.gate_proj
253
+
254
+
255
+
256
+
257
+ dataset_prepared_path: /workspace/axolotl/dolph-2.9.4-nemo-prepared
258
+ val_set_size: 0.01
259
+ output_dir: /workspace/axolotl/dolphin-2.9.4-llama3.1-8b
260
+
261
+ sequence_len: 8192
262
+ sample_packing: true
263
+ pad_to_sequence_len: true
264
+
265
+ wandb_project: dolphin-2.9.4-llama3.1-8b
266
+ wandb_watch:
267
+ wandb_run_id:
268
+ wandb_log_model:
269
+
270
+ gradient_accumulation_steps: 16
271
+ micro_batch_size: 2
272
+ num_epochs: 3
273
+ optimizer: adamw_torch
274
+ lr_scheduler: cosine
275
+ learning_rate: 5e-6
276
+ train_on_inputs: false
277
+ group_by_length: false
278
+ bf16: auto
279
+ fp16:
280
+ tf32:
281
+
282
+ gradient_checkpointing: true
283
+ gradient_checkpointing_kwargs:
284
+ use_reentrant: false
285
+ early_stopping_patience:
286
+ resume_from_checkpoint:
287
+ logging_steps: 1
288
+ xformers_attention:
289
+ flash_attention: true
290
+
291
+ warmup_steps: 100
292
+ # evals_per_epoch: 4
293
+ eval_table_size:
294
+ saves_per_epoch: 1
295
+ save_total_limit: 2
296
+ save_steps:
297
+ debug:
298
+ deepspeed: deepspeed_configs/zero3_bf16.json
299
+ weight_decay: 0.1
300
+ special_tokens:
301
+ eos_token: "<|im_end|>"
302
+ bos_token: "<|begin_of_text|>"
303
+ pad_token: "<|finetune_right_pad_id|>"
304
+ tokens:
305
+ - "<|im_start|>"
306
+
307
+
308
+ # fsdp:
309
+ # - full_shard
310
+ # - auto_wrap
311
+ # fsdp_config:
312
+ # fsdp_limit_all_gathers: true
313
+ # fsdp_sync_module_states: true
314
+ # fsdp_offload_params: true
315
+ # fsdp_use_orig_params: false
316
+ # fsdp_cpu_ram_efficient_loading: true
317
+ # fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock
318
+ # fsdp_state_dict_type: FULL_STATE_DICT
319
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
320
+ # fsdp_sharding_strategy: FULL_SHARD
321
+ # fsdp_forward_prefetch: false
322
+ # fsdp_backward_prefetch: BACKWARD_PRE
323
+ ```
324
+
325
+ </details><br>
326
+
327
+ # workspace/axolotl/dolphin-2.9.4-llama3.1-8b
328
+
329
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
330
+ It achieves the following results on the evaluation set:
331
+ - Loss: 0.5655
332
+
333
+ ## Model description
334
+
335
+ More information needed
336
+
337
+ ## Intended uses & limitations
338
+
339
+ More information needed
340
+
341
+ ## Training and evaluation data
342
+
343
+ More information needed
344
+
345
+ ## Training procedure
346
+
347
+ ### Training hyperparameters
348
+
349
+ The following hyperparameters were used during training:
350
+ - learning_rate: 5e-06
351
+ - train_batch_size: 2
352
+ - eval_batch_size: 2
353
+ - seed: 42
354
+ - distributed_type: multi-GPU
355
+ - num_devices: 8
356
+ - gradient_accumulation_steps: 16
357
+ - total_train_batch_size: 256
358
+ - total_eval_batch_size: 16
359
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
360
+ - lr_scheduler_type: cosine
361
+ - lr_scheduler_warmup_steps: 100
362
+ - num_epochs: 3
363
+
364
+ ### Training results
365
+
366
+ | Training Loss | Epoch | Step | Validation Loss |
367
+ |:-------------:|:------:|:----:|:---------------:|
368
+ | 0.5837 | 1.0180 | 1161 | 0.5814 |
369
+ | 0.5525 | 2.0179 | 2322 | 0.5671 |
370
+ | 0.5514 | 2.9624 | 3420 | 0.5655 |
371
+
372
+
373
+ ### Framework versions
374
+
375
+ - Transformers 4.44.0.dev0
376
+ - Pytorch 2.4.0+cu121
377
+ - Datasets 2.19.1
378
+ - Tokenizers 0.19.1
379
+