.ipynb_checkpoints/README-checkpoint.md DELETED
@@ -1,151 +0,0 @@
1
- ---
2
- license: mit
3
- base_model: microsoft/phi-2
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: phi-sft-out
8
- results: []
9
- ---
10
-
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.4.0`
18
- ```yaml
19
- base_model: microsoft/phi-2
20
- model_type: AutoModelForCausalLM
21
- tokenizer_type: AutoTokenizer
22
-
23
- load_in_8bit: false
24
- load_in_4bit: false
25
- strict: false
26
-
27
- datasets:
28
- - path: Intel/orca_dpo_pairs
29
- type:
30
- system_prompt: ""
31
- field_system: system
32
- field_instruction: question
33
- field_output: rejected
34
- field_output: chosen
35
-
36
- dataset_prepared_path:
37
- val_set_size: 0.05
38
- output_dir: ./phi-sft-out
39
-
40
- sequence_len: 2048
41
- sample_packing: true
42
- pad_to_sequence_len: true
43
-
44
- adapter:
45
- lora_model_dir:
46
- lora_r:
47
- lora_alpha:
48
- lora_dropout:
49
- lora_target_linear:
50
- lora_fan_in_fan_out:
51
-
52
- wandb_project:
53
- wandb_entity:
54
- wandb_watch:
55
- wandb_name:
56
- wandb_log_model:
57
-
58
- gradient_accumulation_steps: 1
59
- micro_batch_size: 2
60
- num_epochs: 2
61
- optimizer: adamw_torch
62
- adam_beta2: 0.95
63
- adam_epsilon: 0.00001
64
- max_grad_norm: 1.0
65
- lr_scheduler: cosine
66
- learning_rate: 0.000003
67
-
68
- train_on_inputs: false
69
- group_by_length: false
70
- bf16: auto
71
- fp16:
72
- tf32: true
73
-
74
- gradient_checkpointing: true
75
- gradient_checkpointing_kwargs:
76
- use_reentrant: True
77
- early_stopping_patience:
78
- resume_from_checkpoint:
79
- local_rank:
80
- logging_steps: 1
81
- xformers_attention:
82
- flash_attention: true
83
-
84
- warmup_steps: 100
85
- evals_per_epoch: 4
86
- saves_per_epoch: 1
87
- debug:
88
- deepspeed:
89
- weight_decay: 0.1
90
- fsdp:
91
- fsdp_config:
92
- resize_token_embeddings_to_32x: true
93
- special_tokens:
94
- pad_token: "<|endoftext|>"
95
-
96
- ```
97
-
98
- </details><br>
99
-
100
- # phi-sft-out
101
-
102
- This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
103
- It achieves the following results on the evaluation set:
104
- - Loss: 1.2999
105
-
106
- ## Model description
107
-
108
- More information needed
109
-
110
- ## Intended uses & limitations
111
-
112
- More information needed
113
-
114
- ## Training and evaluation data
115
-
116
- More information needed
117
-
118
- ## Training procedure
119
-
120
- ### Training hyperparameters
121
-
122
- The following hyperparameters were used during training:
123
- - learning_rate: 3e-06
124
- - train_batch_size: 2
125
- - eval_batch_size: 2
126
- - seed: 42
127
- - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
128
- - lr_scheduler_type: cosine
129
- - lr_scheduler_warmup_steps: 100
130
- - num_epochs: 2
131
-
132
- ### Training results
133
-
134
- | Training Loss | Epoch | Step | Validation Loss |
135
- |:-------------:|:-----:|:----:|:---------------:|
136
- | 1.3053 | 0.0 | 1 | 1.3288 |
137
- | 1.2314 | 0.25 | 287 | 1.3183 |
138
- | 1.1664 | 0.5 | 574 | 1.3090 |
139
- | 1.4349 | 0.75 | 861 | 1.3034 |
140
- | 1.4875 | 1.0 | 1148 | 1.3012 |
141
- | 1.3461 | 1.23 | 1435 | 1.3006 |
142
- | 1.3247 | 1.48 | 1722 | 1.2998 |
143
- | 1.2906 | 1.73 | 2009 | 1.2999 |
144
-
145
-
146
- ### Framework versions
147
-
148
- - Transformers 4.37.0
149
- - Pytorch 2.1.2+cu121
150
- - Datasets 2.16.1
151
- - Tokenizers 0.15.0