End of training

Browse files

Files changed (5) hide show

README.md +16 -11
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
pytorch_model-00001-of-00002.bin +1 -1
pytorch_model-00002-of-00002.bin +1 -1

README.md CHANGED Viewed

@@ -61,13 +61,13 @@ num_epochs: 1
 optimizer: paged_adamw_8bit
 adam_beta1: 0.9
 adam_beta2: 0.95
 adam_epsilon: 0.00001
 lr_scheduler: cosine
 cosine_min_lr_ratio: 0.1
-learning_rate: 1e-5
-#warmup_steps: 4
 warmup_ratio: 0.1
-weight_decay: 0.01
 train_on_inputs: false
 group_by_length: false
@@ -75,6 +75,7 @@ bf16: false
 fp16: false
 tf32: false
 float16: true
 gradient_checkpointing: true
 early_stopping_patience:
@@ -85,7 +86,7 @@ xformers_attention:
 flash_attention: true
-evals_per_epoch: 1
 eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
 eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
@@ -113,7 +114,7 @@ tokens:
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.9473
 ## Model description
@@ -132,24 +133,28 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
-- num_devices: 4
-- total_train_batch_size: 64
-- total_eval_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 8
 - num_epochs: 1
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 0.7928        | 1.0   | 334  | 0.9473          |
 ### Framework versions

 optimizer: paged_adamw_8bit
 adam_beta1: 0.9
 adam_beta2: 0.95
+max_grad_norm: 1.0
 adam_epsilon: 0.00001
 lr_scheduler: cosine
 cosine_min_lr_ratio: 0.1
+learning_rate: 4e-5
 warmup_ratio: 0.1
+weight_decay: 0.1
 train_on_inputs: false
 group_by_length: false
 fp16: false
 tf32: false
 float16: true
+bloat16: false
 gradient_checkpointing: true
 early_stopping_patience:
 flash_attention: true
+evals_per_epoch: 5
 eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
 eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.8954
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 4e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 128
+- total_eval_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 2
 - num_epochs: 1
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 1.0814        | 0.01  | 1    | 1.3422          |
+| 0.8144        | 0.2   | 34   | 0.9416          |
+| 0.7945        | 0.41  | 68   | 0.9114          |
+| 0.7396        | 0.61  | 102  | 0.9004          |
+| 0.7636        | 0.81  | 136  | 0.8954          |
 ### Framework versions

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5d0f618771c029efc7ff584a2754063b34050220f3d9592cd91d592c95eff98f
 size 4995584424

 version https://git-lfs.github.com/spec/v1
+oid sha256:d579bacd8ec057830f9c33f2ad2ca02af4eff997a7a3f6cc9fe5008011c97fac
 size 4995584424

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f10a213020ad3bffe568f392871dd77624ea878d8839fd3c2c80a98030a7888b
 size 563832976

 version https://git-lfs.github.com/spec/v1
+oid sha256:1618866b10caf0ca884cd16716f07996b4fc149126e7f3a5cfef2f611704fddc
 size 563832976

pytorch_model-00001-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e643426a17fd8e3cd55a20be014979ca75fc08b8b7bbd1005d52bdfa4c50dada
 size 4995685160

 version https://git-lfs.github.com/spec/v1
+oid sha256:3eabd8d6302ac1311cee2a2d55c43a2f3630022b8a6025a4e54bee63dc27236e
 size 4995685160

pytorch_model-00002-of-00002.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ef990358fbff965e896e7621fbaafec2e27650fe5f69a7ef679cafa25e4ab386
 size 563839915

 version https://git-lfs.github.com/spec/v1
+oid sha256:7aeda4f5e6e1c98bb621b69b84115628a88bd053766e43795f8663d899b11c06
 size 563839915