pszemraj commited on
Commit
c6d4e41
1 Parent(s): 8528f53

adan details

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -207,6 +207,7 @@ TODO
207
 
208
  #### Epochs 5 & 6
209
  The following hyperparameters were used during training:
 
210
  - learning_rate: 6e-05
211
  - train_batch_size: 4
212
  - eval_batch_size: 1
@@ -214,8 +215,9 @@ The following hyperparameters were used during training:
214
  - distributed_type: multi-GPU
215
  - gradient_accumulation_steps: 32
216
  - total_train_batch_size: 128
217
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
218
  - lr_scheduler_type: constant_with_warmup
 
219
  - num_epochs: 2
220
 
221
  ### Framework versions
 
207
 
208
  #### Epochs 5 & 6
209
  The following hyperparameters were used during training:
210
+
211
  - learning_rate: 6e-05
212
  - train_batch_size: 4
213
  - eval_batch_size: 1
 
215
  - distributed_type: multi-GPU
216
  - gradient_accumulation_steps: 32
217
  - total_train_batch_size: 128
218
+ - optimizer: _ADAN_ using lucidrains' `adan-pytorch` with default betas
219
  - lr_scheduler_type: constant_with_warmup
220
+ - data type: TF32
221
  - num_epochs: 2
222
 
223
  ### Framework versions