yhavinga commited on
Commit
2973fc4
1 Parent(s): bae580e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -145,9 +145,10 @@ Wandb run https://wandb.ai/yepster/ul2-large-de-neddx2-en-nl/runs/s3z13day?works
145
  * Pre-trained model used as starting point: yhavinga/ul2-large-dutch-english (3150k checkpoint)
146
 
147
  The first three epochs were trained using the T5x framework, with a batch size of 128, a constant learning rate of 0.001. This process spanned from step 3150k to 3440k.
148
- For the concluding epoch, a HuggingFace Flax based trainer was used with the following settings:
149
 
150
  - **Batch Size**: Total effective batch size of 512, achieved via per-device settings and gradient accumulation.
 
151
  - **Learning Rate**: Set at 0.0002, utilizing cosine scheduling.
152
  - **Optimizer**: AdamW with beta1=0.9, beta2=0.997, epsilon=1e-8.
153
  - **Weight Decay**: Configured to 0.001 for regularization.
 
145
  * Pre-trained model used as starting point: yhavinga/ul2-large-dutch-english (3150k checkpoint)
146
 
147
  The first three epochs were trained using the T5x framework, with a batch size of 128, a constant learning rate of 0.001. This process spanned from step 3150k to 3440k.
148
+ For the concluding ~half epoch, a HuggingFace Flax based trainer was used with the following settings:
149
 
150
  - **Batch Size**: Total effective batch size of 512, achieved via per-device settings and gradient accumulation.
151
+ - **Num Train Samples**: 5120k.
152
  - **Learning Rate**: Set at 0.0002, utilizing cosine scheduling.
153
  - **Optimizer**: AdamW with beta1=0.9, beta2=0.997, epsilon=1e-8.
154
  - **Weight Decay**: Configured to 0.001 for regularization.