MaziyarPanahi commited on
Commit
593a5b0
1 Parent(s): 6efbdcd

Update README.md (#6)

Browse files

- Update README.md (11229548ed7cae8ea8e3a9ede49bcd08419b01e6)

Files changed (1) hide show
  1. README.md +96 -89
README.md CHANGED
@@ -15,98 +15,18 @@ model-index:
15
  results: []
16
  datasets:
17
  - garage-bAInd/Open-Platypus
 
 
 
 
 
18
  ---
19
 
20
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
- should probably proofread and complete it, then remove this comment. -->
22
-
23
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
24
- <details><summary>See axolotl config</summary>
25
-
26
- axolotl version: `0.4.0`
27
- ```yaml
28
- base_model: microsoft/phi-2
29
- model_type: AutoModelForCausalLM
30
- tokenizer_type: AutoTokenizer
31
-
32
- hub_model_id: MaziyarPanahi/phi-2-logical-sft
33
- hf_use_auth_token: true
34
-
35
-
36
- load_in_8bit: false
37
- load_in_4bit: false
38
- strict: false
39
-
40
- datasets:
41
- - path: garage-bAInd/Open-Platypus
42
- type: alpaca
43
-
44
- dataset_prepared_path:
45
- val_set_size: 0.05
46
- output_dir: ./phi-2-logical-sft-out
47
-
48
- sequence_len: 4096
49
- sample_packing: true
50
- pad_to_sequence_len: true
51
-
52
- adapter:
53
- lora_model_dir:
54
- lora_r:
55
- lora_alpha:
56
- lora_dropout:
57
- lora_target_linear:
58
- lora_fan_in_fan_out:
59
-
60
- wandb_project:
61
- wandb_entity:
62
- wandb_watch:
63
- wandb_name:
64
- wandb_log_model:
65
-
66
- gradient_accumulation_steps: 1
67
- micro_batch_size: 2
68
- num_epochs: 2
69
- optimizer: adamw_torch
70
- adam_beta2: 0.95
71
- adam_epsilon: 0.00001
72
- max_grad_norm: 1.0
73
- lr_scheduler: cosine
74
- learning_rate: 0.000003
75
 
76
- train_on_inputs: false
77
- group_by_length: false
78
- bf16: auto
79
- fp16:
80
- tf32: true
81
 
82
- gradient_checkpointing: true
83
- gradient_checkpointing_kwargs:
84
- use_reentrant: True
85
- early_stopping_patience:
86
- resume_from_checkpoint:
87
- local_rank:
88
- logging_steps: 1
89
- xformers_attention:
90
- flash_attention: true
91
-
92
- warmup_steps: 100
93
- evals_per_epoch: 4
94
- saves_per_epoch: 1
95
- debug:
96
- deepspeed:
97
- weight_decay: 0.1
98
- fsdp:
99
- fsdp_config:
100
- resize_token_embeddings_to_32x: true
101
- special_tokens:
102
- pad_token: "<|endoftext|>"
103
- ```
104
-
105
- </details><br>
106
-
107
- # phi-2-logical-sft
108
-
109
- This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
110
  It achieves the following results on the evaluation set:
111
  - Loss: 1.0075
112
 
@@ -245,4 +165,91 @@ The following hyperparameters were used during training:
245
  - Transformers 4.39.0.dev0
246
  - Pytorch 2.2.0+cu121
247
  - Datasets 2.17.0
248
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  results: []
16
  datasets:
17
  - garage-bAInd/Open-Platypus
18
+ model_name: phi-2-logical-sft
19
+ inference: false
20
+ model_creator: MaziyarPanahi
21
+ pipeline_tag: text-generation
22
+ quantized_by: MaziyarPanahi
23
  ---
24
 
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/5fd5e18a90b6dc4633f6d292/uhDf-zhThjoAwQVAMEo2t.webp" width="600" />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ # MaziyarPanahi/phi-2-logical-sft
 
 
 
 
28
 
29
+ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the `Open-Platypus` dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  It achieves the following results on the evaluation set:
31
  - Loss: 1.0075
32
 
 
165
  - Transformers 4.39.0.dev0
166
  - Pytorch 2.2.0+cu121
167
  - Datasets 2.17.0
168
+ - Tokenizers 0.15.0
169
+
170
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
171
+ should probably proofread and complete it, then remove this comment. -->
172
+
173
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
174
+ <details><summary>See axolotl config</summary>
175
+
176
+ axolotl version: `0.4.0`
177
+ ```yaml
178
+ base_model: microsoft/phi-2
179
+ model_type: AutoModelForCausalLM
180
+ tokenizer_type: AutoTokenizer
181
+
182
+ hub_model_id: MaziyarPanahi/phi-2-logical-sft
183
+ hf_use_auth_token: true
184
+
185
+
186
+ load_in_8bit: false
187
+ load_in_4bit: false
188
+ strict: false
189
+
190
+ datasets:
191
+ - path: garage-bAInd/Open-Platypus
192
+ type: alpaca
193
+
194
+ dataset_prepared_path:
195
+ val_set_size: 0.05
196
+ output_dir: ./phi-2-logical-sft-out
197
+
198
+ sequence_len: 4096
199
+ sample_packing: true
200
+ pad_to_sequence_len: true
201
+
202
+ adapter:
203
+ lora_model_dir:
204
+ lora_r:
205
+ lora_alpha:
206
+ lora_dropout:
207
+ lora_target_linear:
208
+ lora_fan_in_fan_out:
209
+
210
+ wandb_project:
211
+ wandb_entity:
212
+ wandb_watch:
213
+ wandb_name:
214
+ wandb_log_model:
215
+
216
+ gradient_accumulation_steps: 1
217
+ micro_batch_size: 2
218
+ num_epochs: 2
219
+ optimizer: adamw_torch
220
+ adam_beta2: 0.95
221
+ adam_epsilon: 0.00001
222
+ max_grad_norm: 1.0
223
+ lr_scheduler: cosine
224
+ learning_rate: 0.000003
225
+
226
+ train_on_inputs: false
227
+ group_by_length: false
228
+ bf16: auto
229
+ fp16:
230
+ tf32: true
231
+
232
+ gradient_checkpointing: true
233
+ gradient_checkpointing_kwargs:
234
+ use_reentrant: True
235
+ early_stopping_patience:
236
+ resume_from_checkpoint:
237
+ local_rank:
238
+ logging_steps: 1
239
+ xformers_attention:
240
+ flash_attention: true
241
+
242
+ warmup_steps: 100
243
+ evals_per_epoch: 4
244
+ saves_per_epoch: 1
245
+ debug:
246
+ deepspeed:
247
+ weight_decay: 0.1
248
+ fsdp:
249
+ fsdp_config:
250
+ resize_token_embeddings_to_32x: true
251
+ special_tokens:
252
+ pad_token: "<|endoftext|>"
253
+ ```
254
+
255
+ </details><br>