End of training

Browse files

Files changed (5) hide show

README.md +11 -10
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
pytorch_model-00001-of-00002.bin +1 -1
pytorch_model-00002-of-00002.bin +1 -1

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ datasets:
 dataset_prepared_path:
 val_set_size: 0.001
-output_dir: ./phi-sft-out
 sequence_len: 2048
 sample_packing: false  # currently unsupported
@@ -63,8 +63,8 @@ adam_beta2: 0.95
 adam_epsilon: 0.00001
 #max_grad_norm: 1.0
 lr_scheduler: cosine
-learning_rate: 1e-5
-warmup_ratio: 0.03
 weight_decay: 0.01
 train_on_inputs: false
@@ -85,7 +85,7 @@ flash_attention: true
 evals_per_epoch: 1
 eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
-eval_table_max_new_tokens: 512 # Total number of tokens generated for predictions sent to wandb. Default is 128
 saves_per_epoch: 1
 save_total_limit: 1
@@ -109,7 +109,7 @@ tokens:
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.0374
 ## Model description
@@ -128,7 +128,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
@@ -138,16 +138,17 @@ The following hyperparameters were used during training:
 - total_eval_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
 - num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.0571        | 0.01  | 1    | 1.2056          |
-| 0.8271        | 1.0   | 82   | 1.0443          |
-| 0.7871        | 2.0   | 164  | 1.0378          |
-| 0.8198        | 3.0   | 246  | 1.0374          |
 ### Framework versions

 dataset_prepared_path:
 val_set_size: 0.001
+output_dir: ./output
 sequence_len: 2048
 sample_packing: false  # currently unsupported
 adam_epsilon: 0.00001
 #max_grad_norm: 1.0
 lr_scheduler: cosine
+learning_rate: 2e-5
+warmup_steps: 4
 weight_decay: 0.01
 train_on_inputs: false
 evals_per_epoch: 1
 eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
+eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
 saves_per_epoch: 1
 save_total_limit: 1
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.0121
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - total_eval_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 4
 - num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 1.0571        | 0.01  | 1    | 1.3648          |
+| 0.8044        | 1.0   | 82   | 1.0212          |
+| 0.7486        | 2.0   | 164  | 1.0126          |
+| 0.7745        | 3.0   | 246  | 1.0121          |
 ### Framework versions

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1b8e440ba098c6693e1970dbfdf8684e3a14da4bbbdade6d0ad2cd0a952c2b46
 size 4995584424

 version https://git-lfs.github.com/spec/v1
+oid sha256:4e6b2ac5cd4fca2335a391e321a6ed3737b759803f20b91b91e19d1fa1e95c08
 size 4995584424

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dacdf3eaf603bfb78a59db4f250e37abcabf85e593fd346c5fd74fc7e76839b6
 size 563832976

 version https://git-lfs.github.com/spec/v1
+oid sha256:f8b5a250a1c279dc3032236b9641e76e72c47a57ab50db7beed09bb9615f1789
 size 563832976

pytorch_model-00001-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d4ca6187ba5a05330be169c2ba5b686eb828e7ee70bbbc417f00527486bc017d
 size 4995685160

 version https://git-lfs.github.com/spec/v1
+oid sha256:7b89297bf572d4668437fed3b2e66f4ad7def0a4f5e99d8ea0d8db73ac1927a0
 size 4995685160

pytorch_model-00002-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9a1598764727976664df4ce877a1b746d5eafb550412192922e7bac0695011e1
 size 563839915

 version https://git-lfs.github.com/spec/v1
+oid sha256:9babe694436407ae9a4421b5e2e3438fa875bcfd0c3437877df2b5c83b15e810
 size 563839915