pszemraj commited on
Commit
cec0b20
1 Parent(s): b2e34e3

write about updated checkpoint

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -122,21 +122,27 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
122
 
123
  ## Training procedure
124
 
 
 
 
 
125
  ### Training hyperparameters
126
 
127
  The following hyperparameters were used during the **final** training round\*:
128
- - learning_rate: 0.0004
129
- - train_batch_size: 2
 
130
  - eval_batch_size: 1
131
  - seed: 42
132
  - distributed_type: multi-GPU
133
  - gradient_accumulation_steps: 64
134
- - total_train_batch_size: 128
135
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
136
  - lr_scheduler_type: cosine
137
- - lr_scheduler_warmup_ratio: 0.02
138
  - num_epochs: 2
139
 
 
140
  \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
141
 
142
  ### Training results
 
122
 
123
  ## Training procedure
124
 
125
+ ### Updates:
126
+
127
+ - Added a new version on July 3, 2022, with several epochs of additional training that is more performant in general.
128
+
129
  ### Training hyperparameters
130
 
131
  The following hyperparameters were used during the **final** training round\*:
132
+
133
+ - learning_rate: 0.001
134
+ - train_batch_size: 1
135
  - eval_batch_size: 1
136
  - seed: 42
137
  - distributed_type: multi-GPU
138
  - gradient_accumulation_steps: 64
139
+ - total_train_batch_size: 64
140
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
141
  - lr_scheduler_type: cosine
142
+ - lr_scheduler_warmup_ratio: 0.01
143
  - num_epochs: 2
144
 
145
+
146
  \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
147
 
148
  ### Training results