yhavinga
/

ul2-large-dutch-english

Text2Text Generation

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

yhavinga commited on Nov 18, 2023

Commit

71b8b45

•

1 Parent(s): aa27a25

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -135,7 +135,7 @@ Additionally, 100+28 extra tokens were added for pre-training tasks, resulting i
 ### Pretraining
 The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
-for 1000000 steps with a batch size of 64
 (in total 32 B tokens).
 The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
 and then an inverse square root decay (exponential decay) of the learning rate after.

 ### Pretraining
 The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
+for 2650000 steps with a batch size of 64
 (in total 32 B tokens).
 The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
 and then an inverse square root decay (exponential decay) of the learning rate after.