AlekseyKorshuk commited on
Commit
8f40544
1 Parent(s): e17b95f

End of training

Browse files
README.md CHANGED
@@ -23,6 +23,7 @@ tokenizer_type: AutoTokenizer
23
  trust_remote_code: true
24
 
25
  hub_model_id: AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt-test
 
26
 
27
  load_in_8bit: false
28
  load_in_4bit: false
@@ -56,15 +57,16 @@ wandb_log_model:
56
 
57
  gradient_accumulation_steps: 1
58
  micro_batch_size: 16
59
- num_epochs: 3
60
  optimizer: paged_adamw_8bit
61
  adam_beta1: 0.9
62
  adam_beta2: 0.95
63
  adam_epsilon: 0.00001
64
- #max_grad_norm: 1.0
65
  lr_scheduler: cosine
66
- learning_rate: 2e-5
67
- warmup_steps: 4
 
 
68
  weight_decay: 0.01
69
 
70
  train_on_inputs: false
@@ -87,8 +89,10 @@ evals_per_epoch: 1
87
  eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
88
  eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
89
 
 
90
  saves_per_epoch: 1
91
  save_total_limit: 1
 
92
  debug:
93
  deepspeed:
94
 
@@ -109,7 +113,7 @@ tokens:
109
 
110
  This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
111
  It achieves the following results on the evaluation set:
112
- - Loss: 1.0121
113
 
114
  ## Model description
115
 
@@ -128,27 +132,24 @@ More information needed
128
  ### Training hyperparameters
129
 
130
  The following hyperparameters were used during training:
131
- - learning_rate: 2e-05
132
  - train_batch_size: 16
133
  - eval_batch_size: 16
134
  - seed: 42
135
  - distributed_type: multi-GPU
136
- - num_devices: 8
137
- - total_train_batch_size: 128
138
- - total_eval_batch_size: 128
139
  - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
140
  - lr_scheduler_type: cosine
141
- - lr_scheduler_warmup_steps: 4
142
- - num_epochs: 3
143
 
144
  ### Training results
145
 
146
  | Training Loss | Epoch | Step | Validation Loss |
147
  |:-------------:|:-----:|:----:|:---------------:|
148
- | 1.0571 | 0.01 | 1 | 1.3648 |
149
- | 0.8044 | 1.0 | 82 | 1.0212 |
150
- | 0.7486 | 2.0 | 164 | 1.0126 |
151
- | 0.7745 | 3.0 | 246 | 1.0121 |
152
 
153
 
154
  ### Framework versions
 
23
  trust_remote_code: true
24
 
25
  hub_model_id: AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt-test
26
+ hub_strategy: every_save
27
 
28
  load_in_8bit: false
29
  load_in_4bit: false
 
57
 
58
  gradient_accumulation_steps: 1
59
  micro_batch_size: 16
60
+ num_epochs: 1
61
  optimizer: paged_adamw_8bit
62
  adam_beta1: 0.9
63
  adam_beta2: 0.95
64
  adam_epsilon: 0.00001
 
65
  lr_scheduler: cosine
66
+ cosine_min_lr_ratio: 0.1
67
+ learning_rate: 1e-5
68
+ #warmup_steps: 4
69
+ warmup_ratio: 0.1
70
  weight_decay: 0.01
71
 
72
  train_on_inputs: false
 
89
  eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
90
  eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
91
 
92
+ chat_template: chatml
93
  saves_per_epoch: 1
94
  save_total_limit: 1
95
+ seed: 42
96
  debug:
97
  deepspeed:
98
 
 
113
 
114
  This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
115
  It achieves the following results on the evaluation set:
116
+ - Loss: 0.9473
117
 
118
  ## Model description
119
 
 
132
  ### Training hyperparameters
133
 
134
  The following hyperparameters were used during training:
135
+ - learning_rate: 1e-05
136
  - train_batch_size: 16
137
  - eval_batch_size: 16
138
  - seed: 42
139
  - distributed_type: multi-GPU
140
+ - num_devices: 4
141
+ - total_train_batch_size: 64
142
+ - total_eval_batch_size: 64
143
  - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
144
  - lr_scheduler_type: cosine
145
+ - lr_scheduler_warmup_steps: 8
146
+ - num_epochs: 1
147
 
148
  ### Training results
149
 
150
  | Training Loss | Epoch | Step | Validation Loss |
151
  |:-------------:|:-----:|:----:|:---------------:|
152
+ | 0.7928 | 1.0 | 334 | 0.9473 |
 
 
 
153
 
154
 
155
  ### Framework versions
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e6b2ac5cd4fca2335a391e321a6ed3737b759803f20b91b91e19d1fa1e95c08
3
  size 4995584424
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d0f618771c029efc7ff584a2754063b34050220f3d9592cd91d592c95eff98f
3
  size 4995584424
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f8b5a250a1c279dc3032236b9641e76e72c47a57ab50db7beed09bb9615f1789
3
  size 563832976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f10a213020ad3bffe568f392871dd77624ea878d8839fd3c2c80a98030a7888b
3
  size 563832976
pytorch_model-00001-of-00002.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7b89297bf572d4668437fed3b2e66f4ad7def0a4f5e99d8ea0d8db73ac1927a0
3
  size 4995685160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e643426a17fd8e3cd55a20be014979ca75fc08b8b7bbd1005d52bdfa4c50dada
3
  size 4995685160
pytorch_model-00002-of-00002.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9babe694436407ae9a4421b5e2e3438fa875bcfd0c3437877df2b5c83b15e810
3
  size 563839915
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef990358fbff965e896e7621fbaafec2e27650fe5f69a7ef679cafa25e4ab386
3
  size 563839915