yhavinga commited on
Commit
71b8b45
1 Parent(s): aa27a25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -135,7 +135,7 @@ Additionally, 100+28 extra tokens were added for pre-training tasks, resulting i
135
 
136
  ### Pretraining
137
  The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
138
- for 1000000 steps with a batch size of 64
139
  (in total 32 B tokens).
140
  The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
141
  and then an inverse square root decay (exponential decay) of the learning rate after.
 
135
 
136
  ### Pretraining
137
  The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
138
+ for 2650000 steps with a batch size of 64
139
  (in total 32 B tokens).
140
  The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
141
  and then an inverse square root decay (exponential decay) of the learning rate after.