flax-community
/

t5-base-dutch

Text2Text Generation

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

yhavinga commited on Jul 19, 2021

Commit

2c7b7d9

•

1 Parent(s): e8cec0f

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -28,4 +28,11 @@ See the `clean` directory for the clean script.
 ## Training
-The model was trained for 63000 steps with a batch size of 128, ending with an evaluation loss of 1.79 and accuracy of 0.64.

 ## Training
+Training of the model was resumed from an earlier checkpoint several times, as can be seen in the training metrics tab. (switch to wall time for a better view).
+After several hours of training an error would be raised that we haven't been able to identify and solve. As a workaround,
+the first few resumes would start again at step 0 with a different seeded reshuffling of the data.
+In the last two resumes the random seed was fixed, and training would resume at the previous step, since a try/except around the failing example would allow training to continue in the case of errors caused by a single example.
+The final model was trained for 63000 steps with a batch size of 128, ending with an evaluation loss of 1.79 and accuracy of 0.64.
+A triangle learning rate schedule was used, with peak learning rate 0.01 for the first few runs, and 0.001 for the last two runs.