End of training

Browse files

Files changed (5) hide show

README.md +16 -15
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
pytorch_model-00001-of-00002.bin +1 -1
pytorch_model-00002-of-00002.bin +1 -1

README.md CHANGED Viewed

@@ -23,6 +23,7 @@ tokenizer_type: AutoTokenizer
 trust_remote_code: true
 hub_model_id: AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt-test
 load_in_8bit: false
 load_in_4bit: false
@@ -56,15 +57,16 @@ wandb_log_model:
 gradient_accumulation_steps: 1
 micro_batch_size: 16
-num_epochs: 3
 optimizer: paged_adamw_8bit
 adam_beta1: 0.9
 adam_beta2: 0.95
 adam_epsilon: 0.00001
-#max_grad_norm: 1.0
 lr_scheduler: cosine
-learning_rate: 2e-5
-warmup_steps: 4
 weight_decay: 0.01
 train_on_inputs: false
@@ -87,8 +89,10 @@ evals_per_epoch: 1
 eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
 eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
 saves_per_epoch: 1
 save_total_limit: 1
 debug:
 deepspeed:
@@ -109,7 +113,7 @@ tokens:
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.0121
 ## Model description
@@ -128,27 +132,24 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
-- num_devices: 8
-- total_train_batch_size: 128
-- total_eval_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 4
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.0571        | 0.01  | 1    | 1.3648          |
-| 0.8044        | 1.0   | 82   | 1.0212          |
-| 0.7486        | 2.0   | 164  | 1.0126          |
-| 0.7745        | 3.0   | 246  | 1.0121          |
 ### Framework versions

 trust_remote_code: true
 hub_model_id: AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt-test
+hub_strategy: every_save
 load_in_8bit: false
 load_in_4bit: false
 gradient_accumulation_steps: 1
 micro_batch_size: 16
+num_epochs: 1
 optimizer: paged_adamw_8bit
 adam_beta1: 0.9
 adam_beta2: 0.95
 adam_epsilon: 0.00001
 lr_scheduler: cosine
+cosine_min_lr_ratio: 0.1
+learning_rate: 1e-5
+#warmup_steps: 4
+warmup_ratio: 0.1
 weight_decay: 0.01
 train_on_inputs: false
 eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
 eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
+chat_template: chatml
 saves_per_epoch: 1
 save_total_limit: 1
+seed: 42
 debug:
 deepspeed:
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.9473
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
+- num_devices: 4
+- total_train_batch_size: 64
+- total_eval_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 8
+- num_epochs: 1
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 0.7928        | 1.0   | 334  | 0.9473          |
 ### Framework versions

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4e6b2ac5cd4fca2335a391e321a6ed3737b759803f20b91b91e19d1fa1e95c08
 size 4995584424

 version https://git-lfs.github.com/spec/v1
+oid sha256:5d0f618771c029efc7ff584a2754063b34050220f3d9592cd91d592c95eff98f
 size 4995584424

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f8b5a250a1c279dc3032236b9641e76e72c47a57ab50db7beed09bb9615f1789
 size 563832976

 version https://git-lfs.github.com/spec/v1
+oid sha256:f10a213020ad3bffe568f392871dd77624ea878d8839fd3c2c80a98030a7888b
 size 563832976

pytorch_model-00001-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7b89297bf572d4668437fed3b2e66f4ad7def0a4f5e99d8ea0d8db73ac1927a0
 size 4995685160

 version https://git-lfs.github.com/spec/v1
+oid sha256:e643426a17fd8e3cd55a20be014979ca75fc08b8b7bbd1005d52bdfa4c50dada
 size 4995685160

pytorch_model-00002-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9babe694436407ae9a4421b5e2e3438fa875bcfd0c3437877df2b5c83b15e810
 size 563839915

 version https://git-lfs.github.com/spec/v1
+oid sha256:ef990358fbff965e896e7621fbaafec2e27650fe5f69a7ef679cafa25e4ab386
 size 563839915