yhavinga commited on
Commit
2c7b7d9
1 Parent(s): e8cec0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -28,4 +28,11 @@ See the `clean` directory for the clean script.
28
 
29
  ## Training
30
 
31
- The model was trained for 63000 steps with a batch size of 128, ending with an evaluation loss of 1.79 and accuracy of 0.64.
 
 
 
 
 
 
 
28
 
29
  ## Training
30
 
31
+ Training of the model was resumed from an earlier checkpoint several times, as can be seen in the training metrics tab. (switch to wall time for a better view).
32
+
33
+ After several hours of training an error would be raised that we haven't been able to identify and solve. As a workaround,
34
+ the first few resumes would start again at step 0 with a different seeded reshuffling of the data.
35
+ In the last two resumes the random seed was fixed, and training would resume at the previous step, since a try/except around the failing example would allow training to continue in the case of errors caused by a single example.
36
+
37
+ The final model was trained for 63000 steps with a batch size of 128, ending with an evaluation loss of 1.79 and accuracy of 0.64.
38
+ A triangle learning rate schedule was used, with peak learning rate 0.01 for the first few runs, and 0.001 for the last two runs.