AlekseyKorshuk commited on
Commit
805b89b
1 Parent(s): 6ae915c

End of training

Browse files
README.md CHANGED
@@ -61,13 +61,13 @@ num_epochs: 1
61
  optimizer: paged_adamw_8bit
62
  adam_beta1: 0.9
63
  adam_beta2: 0.95
 
64
  adam_epsilon: 0.00001
65
  lr_scheduler: cosine
66
  cosine_min_lr_ratio: 0.1
67
- learning_rate: 1e-5
68
- #warmup_steps: 4
69
  warmup_ratio: 0.1
70
- weight_decay: 0.01
71
 
72
  train_on_inputs: false
73
  group_by_length: false
@@ -75,6 +75,7 @@ bf16: false
75
  fp16: false
76
  tf32: false
77
  float16: true
 
78
 
79
  gradient_checkpointing: true
80
  early_stopping_patience:
@@ -85,7 +86,7 @@ xformers_attention:
85
  flash_attention: true
86
 
87
 
88
- evals_per_epoch: 1
89
  eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
90
  eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
91
 
@@ -113,7 +114,7 @@ tokens:
113
 
114
  This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
115
  It achieves the following results on the evaluation set:
116
- - Loss: 0.9473
117
 
118
  ## Model description
119
 
@@ -132,24 +133,28 @@ More information needed
132
  ### Training hyperparameters
133
 
134
  The following hyperparameters were used during training:
135
- - learning_rate: 1e-05
136
  - train_batch_size: 16
137
  - eval_batch_size: 16
138
  - seed: 42
139
  - distributed_type: multi-GPU
140
- - num_devices: 4
141
- - total_train_batch_size: 64
142
- - total_eval_batch_size: 64
143
  - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
144
  - lr_scheduler_type: cosine
145
- - lr_scheduler_warmup_steps: 8
146
  - num_epochs: 1
147
 
148
  ### Training results
149
 
150
  | Training Loss | Epoch | Step | Validation Loss |
151
  |:-------------:|:-----:|:----:|:---------------:|
152
- | 0.7928 | 1.0 | 334 | 0.9473 |
 
 
 
 
153
 
154
 
155
  ### Framework versions
 
61
  optimizer: paged_adamw_8bit
62
  adam_beta1: 0.9
63
  adam_beta2: 0.95
64
+ max_grad_norm: 1.0
65
  adam_epsilon: 0.00001
66
  lr_scheduler: cosine
67
  cosine_min_lr_ratio: 0.1
68
+ learning_rate: 4e-5
 
69
  warmup_ratio: 0.1
70
+ weight_decay: 0.1
71
 
72
  train_on_inputs: false
73
  group_by_length: false
 
75
  fp16: false
76
  tf32: false
77
  float16: true
78
+ bloat16: false
79
 
80
  gradient_checkpointing: true
81
  early_stopping_patience:
 
86
  flash_attention: true
87
 
88
 
89
+ evals_per_epoch: 5
90
  eval_table_size: 8 # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
91
  eval_table_max_new_tokens: 768 # Total number of tokens generated for predictions sent to wandb. Default is 128
92
 
 
114
 
115
  This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
116
  It achieves the following results on the evaluation set:
117
+ - Loss: 0.8954
118
 
119
  ## Model description
120
 
 
133
  ### Training hyperparameters
134
 
135
  The following hyperparameters were used during training:
136
+ - learning_rate: 4e-05
137
  - train_batch_size: 16
138
  - eval_batch_size: 16
139
  - seed: 42
140
  - distributed_type: multi-GPU
141
+ - num_devices: 8
142
+ - total_train_batch_size: 128
143
+ - total_eval_batch_size: 128
144
  - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
145
  - lr_scheduler_type: cosine
146
+ - lr_scheduler_warmup_steps: 2
147
  - num_epochs: 1
148
 
149
  ### Training results
150
 
151
  | Training Loss | Epoch | Step | Validation Loss |
152
  |:-------------:|:-----:|:----:|:---------------:|
153
+ | 1.0814 | 0.01 | 1 | 1.3422 |
154
+ | 0.8144 | 0.2 | 34 | 0.9416 |
155
+ | 0.7945 | 0.41 | 68 | 0.9114 |
156
+ | 0.7396 | 0.61 | 102 | 0.9004 |
157
+ | 0.7636 | 0.81 | 136 | 0.8954 |
158
 
159
 
160
  ### Framework versions
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5d0f618771c029efc7ff584a2754063b34050220f3d9592cd91d592c95eff98f
3
  size 4995584424
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d579bacd8ec057830f9c33f2ad2ca02af4eff997a7a3f6cc9fe5008011c97fac
3
  size 4995584424
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f10a213020ad3bffe568f392871dd77624ea878d8839fd3c2c80a98030a7888b
3
  size 563832976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1618866b10caf0ca884cd16716f07996b4fc149126e7f3a5cfef2f611704fddc
3
  size 563832976
pytorch_model-00001-of-00002.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e643426a17fd8e3cd55a20be014979ca75fc08b8b7bbd1005d52bdfa4c50dada
3
  size 4995685160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3eabd8d6302ac1311cee2a2d55c43a2f3630022b8a6025a4e54bee63dc27236e
3
  size 4995685160
pytorch_model-00002-of-00002.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ef990358fbff965e896e7621fbaafec2e27650fe5f69a7ef679cafa25e4ab386
3
  size 563839915
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7aeda4f5e6e1c98bb621b69b84115628a88bd053766e43795f8663d899b11c06
3
  size 563839915