flair-uk-backward / loss.txt
Dmitry Chaplinsky
Updated model: 758 splits, 27.07 epochs, min_loss: 1.0411, min_ppl: 2.8325
979b5bb
raw
history blame
87.4 kB
| end of split 1 / 28 | epoch 1 | time: 3789.14s | valid loss 1.9590 | valid ppl 7.0919 | learning rate 20.0000
| end of split 2 / 28 | epoch 1 | time: 3789.55s | valid loss 1.5745 | valid ppl 4.8282 | learning rate 20.0000
| end of split 3 / 28 | epoch 1 | time: 3801.06s | valid loss 1.4277 | valid ppl 4.1690 | learning rate 20.0000
| end of split 4 / 28 | epoch 1 | time: 3796.22s | valid loss 1.3590 | valid ppl 3.8922 | learning rate 20.0000
| end of split 5 / 28 | epoch 1 | time: 3796.46s | valid loss 1.3225 | valid ppl 3.7527 | learning rate 20.0000
| end of split 6 / 28 | epoch 1 | time: 3800.42s | valid loss 1.2908 | valid ppl 3.6357 | learning rate 20.0000
| end of split 7 / 28 | epoch 1 | time: 3795.50s | valid loss 1.2755 | valid ppl 3.5803 | learning rate 20.0000
| end of split 8 / 28 | epoch 1 | time: 3796.83s | valid loss 1.2515 | valid ppl 3.4956 | learning rate 20.0000
| end of split 9 / 28 | epoch 1 | time: 3795.35s | valid loss 1.2422 | valid ppl 3.4631 | learning rate 20.0000
| end of split 10 / 28 | epoch 1 | time: 3797.17s | valid loss 1.2255 | valid ppl 3.4059 | learning rate 20.0000
| end of split 11 / 28 | epoch 1 | time: 3792.19s | valid loss 1.2145 | valid ppl 3.3686 | learning rate 20.0000
| end of split 12 / 28 | epoch 1 | time: 3789.43s | valid loss 1.2078 | valid ppl 3.3463 | learning rate 20.0000
| end of split 13 / 28 | epoch 1 | time: 36736.65s | valid loss 1.1987 | valid ppl 3.3159 | learning rate 20.0000
| end of split 14 / 28 | epoch 1 | time: 3787.94s | valid loss 1.1954 | valid ppl 3.3047 | learning rate 20.0000
| end of split 15 / 28 | epoch 1 | time: 3809.75s | valid loss 1.1862 | valid ppl 3.2745 | learning rate 20.0000
| end of split 16 / 28 | epoch 1 | time: 3844.97s | valid loss 1.1829 | valid ppl 3.2637 | learning rate 20.0000
| end of split 17 / 28 | epoch 1 | time: 3843.82s | valid loss 1.1774 | valid ppl 3.2460 | learning rate 20.0000
| end of split 18 / 28 | epoch 1 | time: 3846.40s | valid loss 1.1728 | valid ppl 3.2310 | learning rate 20.0000
| end of split 19 / 28 | epoch 1 | time: 3844.98s | valid loss 1.1681 | valid ppl 3.2159 | learning rate 20.0000
| end of split 20 / 28 | epoch 1 | time: 3815.00s | valid loss 1.1632 | valid ppl 3.2000 | learning rate 20.0000
| end of split 21 / 28 | epoch 1 | time: 3794.38s | valid loss 1.1613 | valid ppl 3.1939 | learning rate 20.0000
| end of split 22 / 28 | epoch 1 | time: 3796.78s | valid loss 1.1564 | valid ppl 3.1786 | learning rate 20.0000
| end of split 23 / 28 | epoch 1 | time: 3797.39s | valid loss 1.1545 | valid ppl 3.1725 | learning rate 20.0000
| end of split 24 / 28 | epoch 1 | time: 3797.94s | valid loss 1.1518 | valid ppl 3.1640 | learning rate 20.0000
| end of split 25 / 28 | epoch 1 | time: 3796.01s | valid loss 1.1469 | valid ppl 3.1485 | learning rate 20.0000
| end of split 26 / 28 | epoch 1 | time: 3796.73s | valid loss 1.1459 | valid ppl 3.1451 | learning rate 20.0000
| end of split 27 / 28 | epoch 1 | time: 3796.46s | valid loss 1.1429 | valid ppl 3.1358 | learning rate 20.0000
| end of split 28 / 28 | epoch 1 | time: 1096.56s | valid loss 1.1447 | valid ppl 3.1414 | learning rate 20.0000
| end of split 1 / 28 | epoch 2 | time: 3793.96s | valid loss 1.1414 | valid ppl 3.1312 | learning rate 20.0000
| end of split 2 / 28 | epoch 2 | time: 1096.67s | valid loss 1.1419 | valid ppl 3.1329 | learning rate 20.0000
| end of split 3 / 28 | epoch 2 | time: 3796.47s | valid loss 1.1401 | valid ppl 3.1269 | learning rate 20.0000
| end of split 4 / 28 | epoch 2 | time: 3798.81s | valid loss 1.1371 | valid ppl 3.1176 | learning rate 20.0000
| end of split 5 / 28 | epoch 2 | time: 3797.67s | valid loss 1.1361 | valid ppl 3.1146 | learning rate 20.0000
| end of split 6 / 28 | epoch 2 | time: 3798.63s | valid loss 1.1336 | valid ppl 3.1067 | learning rate 20.0000
| end of split 7 / 28 | epoch 2 | time: 3791.11s | valid loss 1.1323 | valid ppl 3.1028 | learning rate 20.0000
| end of split 8 / 28 | epoch 2 | time: 3788.66s | valid loss 1.1296 | valid ppl 3.0944 | learning rate 20.0000
| end of split 9 / 28 | epoch 2 | time: 3797.21s | valid loss 1.1272 | valid ppl 3.0869 | learning rate 20.0000
| end of split 10 / 28 | epoch 2 | time: 3794.19s | valid loss 1.1253 | valid ppl 3.0810 | learning rate 20.0000
| end of split 11 / 28 | epoch 2 | time: 3797.66s | valid loss 1.1238 | valid ppl 3.0765 | learning rate 20.0000
| end of split 12 / 28 | epoch 2 | time: 3795.30s | valid loss 1.1242 | valid ppl 3.0777 | learning rate 20.0000
| end of split 13 / 28 | epoch 2 | time: 3799.97s | valid loss 1.1220 | valid ppl 3.0710 | learning rate 20.0000
| end of split 14 / 28 | epoch 2 | time: 3798.40s | valid loss 1.1198 | valid ppl 3.0644 | learning rate 20.0000
| end of split 15 / 28 | epoch 2 | time: 3800.94s | valid loss 1.1200 | valid ppl 3.0650 | learning rate 20.0000
| end of split 16 / 28 | epoch 2 | time: 3795.23s | valid loss 1.1184 | valid ppl 3.0600 | learning rate 20.0000
| end of split 17 / 28 | epoch 2 | time: 3797.60s | valid loss 1.1181 | valid ppl 3.0591 | learning rate 20.0000
| end of split 18 / 28 | epoch 2 | time: 3794.23s | valid loss 1.1155 | valid ppl 3.0512 | learning rate 20.0000
| end of split 19 / 28 | epoch 2 | time: 3794.97s | valid loss 1.1144 | valid ppl 3.0477 | learning rate 20.0000
| end of split 20 / 28 | epoch 2 | time: 3801.57s | valid loss 1.1144 | valid ppl 3.0476 | learning rate 20.0000
| end of split 21 / 28 | epoch 2 | time: 3797.96s | valid loss 1.1128 | valid ppl 3.0428 | learning rate 20.0000
| end of split 22 / 28 | epoch 2 | time: 3797.43s | valid loss 1.1112 | valid ppl 3.0381 | learning rate 20.0000
| end of split 23 / 28 | epoch 2 | time: 3794.87s | valid loss 1.1099 | valid ppl 3.0342 | learning rate 20.0000
| end of split 24 / 28 | epoch 2 | time: 3799.90s | valid loss 1.1100 | valid ppl 3.0344 | learning rate 20.0000
| end of split 25 / 28 | epoch 2 | time: 3802.10s | valid loss 1.1083 | valid ppl 3.0291 | learning rate 20.0000
| end of split 26 / 28 | epoch 2 | time: 3800.69s | valid loss 1.1076 | valid ppl 3.0270 | learning rate 20.0000
| end of split 27 / 28 | epoch 2 | time: 3796.47s | valid loss 1.1065 | valid ppl 3.0238 | learning rate 20.0000
| end of split 28 / 28 | epoch 2 | time: 3801.18s | valid loss 1.1051 | valid ppl 3.0196 | learning rate 20.0000
| end of split 1 / 28 | epoch 3 | time: 3796.57s | valid loss 1.1045 | valid ppl 3.0176 | learning rate 20.0000
| end of split 2 / 28 | epoch 3 | time: 3801.61s | valid loss 1.1035 | valid ppl 3.0146 | learning rate 20.0000
| end of split 3 / 28 | epoch 3 | time: 3800.25s | valid loss 1.1027 | valid ppl 3.0122 | learning rate 20.0000
| end of split 4 / 28 | epoch 3 | time: 3800.72s | valid loss 1.1013 | valid ppl 3.0080 | learning rate 20.0000
| end of split 5 / 28 | epoch 3 | time: 3802.82s | valid loss 1.1010 | valid ppl 3.0072 | learning rate 20.0000
| end of split 6 / 28 | epoch 3 | time: 3802.42s | valid loss 1.1003 | valid ppl 3.0052 | learning rate 20.0000
| end of split 7 / 28 | epoch 3 | time: 3798.84s | valid loss 1.1001 | valid ppl 3.0044 | learning rate 20.0000
| end of split 8 / 28 | epoch 3 | time: 3793.80s | valid loss 1.1002 | valid ppl 3.0046 | learning rate 20.0000
| end of split 9 / 28 | epoch 3 | time: 3797.24s | valid loss 1.0987 | valid ppl 3.0002 | learning rate 20.0000
| end of split 10 / 28 | epoch 3 | time: 3795.35s | valid loss 1.0976 | valid ppl 2.9969 | learning rate 20.0000
| end of split 11 / 28 | epoch 3 | time: 3796.91s | valid loss 1.0978 | valid ppl 2.9976 | learning rate 20.0000
| end of split 12 / 28 | epoch 3 | time: 3797.71s | valid loss 1.0973 | valid ppl 2.9962 | learning rate 20.0000
| end of split 13 / 28 | epoch 3 | time: 3795.99s | valid loss 1.0967 | valid ppl 2.9943 | learning rate 20.0000
| end of split 14 / 28 | epoch 3 | time: 3795.07s | valid loss 1.0957 | valid ppl 2.9913 | learning rate 20.0000
| end of split 15 / 28 | epoch 3 | time: 3793.25s | valid loss 1.0942 | valid ppl 2.9869 | learning rate 20.0000
| end of split 16 / 28 | epoch 3 | time: 3797.79s | valid loss 1.0940 | valid ppl 2.9863 | learning rate 20.0000
| end of split 17 / 28 | epoch 3 | time: 3796.74s | valid loss 1.0934 | valid ppl 2.9844 | learning rate 20.0000
| end of split 18 / 28 | epoch 3 | time: 3794.47s | valid loss 1.0924 | valid ppl 2.9815 | learning rate 20.0000
| end of split 19 / 28 | epoch 3 | time: 3794.62s | valid loss 1.0924 | valid ppl 2.9814 | learning rate 20.0000
| end of split 20 / 28 | epoch 3 | time: 3797.27s | valid loss 1.0907 | valid ppl 2.9764 | learning rate 20.0000
| end of split 21 / 28 | epoch 3 | time: 3796.49s | valid loss 1.0909 | valid ppl 2.9770 | learning rate 20.0000
| end of split 22 / 28 | epoch 3 | time: 3798.45s | valid loss 1.0913 | valid ppl 2.9783 | learning rate 20.0000
| end of split 23 / 28 | epoch 3 | time: 1098.05s | valid loss 1.0917 | valid ppl 2.9792 | learning rate 20.0000
| end of split 24 / 28 | epoch 3 | time: 3789.62s | valid loss 1.0908 | valid ppl 2.9768 | learning rate 20.0000
| end of split 25 / 28 | epoch 3 | time: 3790.60s | valid loss 1.0899 | valid ppl 2.9739 | learning rate 20.0000
| end of split 26 / 28 | epoch 3 | time: 3794.69s | valid loss 1.0878 | valid ppl 2.9677 | learning rate 20.0000
| end of split 27 / 28 | epoch 3 | time: 3789.68s | valid loss 1.0886 | valid ppl 2.9702 | learning rate 20.0000
| end of split 28 / 28 | epoch 3 | time: 3798.26s | valid loss 1.0890 | valid ppl 2.9712 | learning rate 20.0000
| end of split 1 / 28 | epoch 4 | time: 3791.05s | valid loss 1.0875 | valid ppl 2.9668 | learning rate 20.0000
| end of split 2 / 28 | epoch 4 | time: 3801.11s | valid loss 1.0872 | valid ppl 2.9658 | learning rate 20.0000
| end of split 3 / 28 | epoch 4 | time: 3799.85s | valid loss 1.0874 | valid ppl 2.9665 | learning rate 20.0000
| end of split 4 / 28 | epoch 4 | time: 3798.81s | valid loss 1.0856 | valid ppl 2.9611 | learning rate 20.0000
| end of split 5 / 28 | epoch 4 | time: 3799.37s | valid loss 1.0849 | valid ppl 2.9591 | learning rate 20.0000
| end of split 6 / 28 | epoch 4 | time: 3794.42s | valid loss 1.0845 | valid ppl 2.9578 | learning rate 20.0000
| end of split 7 / 28 | epoch 4 | time: 3795.86s | valid loss 1.0865 | valid ppl 2.9639 | learning rate 20.0000
| end of split 8 / 28 | epoch 4 | time: 3796.29s | valid loss 1.0845 | valid ppl 2.9580 | learning rate 20.0000
| end of split 9 / 28 | epoch 4 | time: 3799.07s | valid loss 1.0838 | valid ppl 2.9560 | learning rate 20.0000
| end of split 10 / 28 | epoch 4 | time: 3798.77s | valid loss 1.0856 | valid ppl 2.9612 | learning rate 20.0000
| end of split 11 / 28 | epoch 4 | time: 3795.42s | valid loss 1.0826 | valid ppl 2.9524 | learning rate 20.0000
| end of split 12 / 28 | epoch 4 | time: 3798.31s | valid loss 1.0829 | valid ppl 2.9533 | learning rate 20.0000
| end of split 13 / 28 | epoch 4 | time: 1097.39s | valid loss 1.0828 | valid ppl 2.9528 | learning rate 20.0000
| end of split 14 / 28 | epoch 4 | time: 3796.62s | valid loss 1.0831 | valid ppl 2.9538 | learning rate 20.0000
| end of split 15 / 28 | epoch 4 | time: 3794.73s | valid loss 1.0821 | valid ppl 2.9508 | learning rate 20.0000
| end of split 16 / 28 | epoch 4 | time: 3797.00s | valid loss 1.0810 | valid ppl 2.9476 | learning rate 20.0000
| end of split 17 / 28 | epoch 4 | time: 3806.15s | valid loss 1.0812 | valid ppl 2.9481 | learning rate 20.0000
| end of split 18 / 28 | epoch 4 | time: 3806.71s | valid loss 1.0809 | valid ppl 2.9473 | learning rate 20.0000
| end of split 19 / 28 | epoch 4 | time: 3795.87s | valid loss 1.0813 | valid ppl 2.9484 | learning rate 20.0000
| end of split 20 / 28 | epoch 4 | time: 3799.98s | valid loss 1.0817 | valid ppl 2.9497 | learning rate 20.0000
| end of split 21 / 28 | epoch 4 | time: 3795.32s | valid loss 1.0803 | valid ppl 2.9455 | learning rate 20.0000
| end of split 22 / 28 | epoch 4 | time: 3794.34s | valid loss 1.0797 | valid ppl 2.9438 | learning rate 20.0000
| end of split 23 / 28 | epoch 4 | time: 3804.34s | valid loss 1.0790 | valid ppl 2.9417 | learning rate 20.0000
| end of split 24 / 28 | epoch 4 | time: 3798.90s | valid loss 1.0796 | valid ppl 2.9434 | learning rate 20.0000
| end of split 25 / 28 | epoch 4 | time: 3804.95s | valid loss 1.0802 | valid ppl 2.9454 | learning rate 20.0000
| end of split 26 / 28 | epoch 4 | time: 3799.98s | valid loss 1.0779 | valid ppl 2.9385 | learning rate 20.0000
| end of split 27 / 28 | epoch 4 | time: 3804.99s | valid loss 1.0798 | valid ppl 2.9441 | learning rate 20.0000
| end of split 28 / 28 | epoch 4 | time: 3804.92s | valid loss 1.0784 | valid ppl 2.9399 | learning rate 20.0000
| end of split 1 / 28 | epoch 5 | time: 3793.19s | valid loss 1.0781 | valid ppl 2.9390 | learning rate 20.0000
| end of split 2 / 28 | epoch 5 | time: 3794.63s | valid loss 1.0771 | valid ppl 2.9363 | learning rate 20.0000
| end of split 3 / 28 | epoch 5 | time: 3797.63s | valid loss 1.0761 | valid ppl 2.9333 | learning rate 20.0000
| end of split 4 / 28 | epoch 5 | time: 3797.24s | valid loss 1.0752 | valid ppl 2.9305 | learning rate 20.0000
| end of split 5 / 28 | epoch 5 | time: 3835.87s | valid loss 1.0764 | valid ppl 2.9340 | learning rate 20.0000
| end of split 6 / 28 | epoch 5 | time: 3836.48s | valid loss 1.0759 | valid ppl 2.9327 | learning rate 20.0000
| end of split 7 / 28 | epoch 5 | time: 3804.72s | valid loss 1.0756 | valid ppl 2.9319 | learning rate 20.0000
| end of split 8 / 28 | epoch 5 | time: 3797.48s | valid loss 1.0757 | valid ppl 2.9321 | learning rate 20.0000
| end of split 9 / 28 | epoch 5 | time: 3800.06s | valid loss 1.0751 | valid ppl 2.9303 | learning rate 20.0000
| end of split 10 / 28 | epoch 5 | time: 3796.96s | valid loss 1.0766 | valid ppl 2.9346 | learning rate 20.0000
| end of split 11 / 28 | epoch 5 | time: 3796.87s | valid loss 1.0751 | valid ppl 2.9303 | learning rate 20.0000
| end of split 12 / 28 | epoch 5 | time: 3794.98s | valid loss 1.0740 | valid ppl 2.9270 | learning rate 20.0000
| end of split 13 / 28 | epoch 5 | time: 3794.18s | valid loss 1.0737 | valid ppl 2.9261 | learning rate 20.0000
| end of split 14 / 28 | epoch 5 | time: 3794.87s | valid loss 1.0749 | valid ppl 2.9296 | learning rate 20.0000
| end of split 15 / 28 | epoch 5 | time: 3794.59s | valid loss 1.0737 | valid ppl 2.9263 | learning rate 20.0000
| end of split 16 / 28 | epoch 5 | time: 3798.73s | valid loss 1.0746 | valid ppl 2.9288 | learning rate 20.0000
| end of split 17 / 28 | epoch 5 | time: 3799.97s | valid loss 1.0912 | valid ppl 2.9777 | learning rate 20.0000
| end of split 18 / 28 | epoch 5 | time: 1097.48s | valid loss 1.0744 | valid ppl 2.9284 | learning rate 20.0000
| end of split 19 / 28 | epoch 5 | time: 3800.18s | valid loss 1.0725 | valid ppl 2.9227 | learning rate 20.0000
| end of split 20 / 28 | epoch 5 | time: 3801.07s | valid loss 1.0746 | valid ppl 2.9288 | learning rate 20.0000
| end of split 21 / 28 | epoch 5 | time: 3803.87s | valid loss 1.0742 | valid ppl 2.9277 | learning rate 20.0000
| end of split 22 / 28 | epoch 5 | time: 3807.38s | valid loss 1.0745 | valid ppl 2.9286 | learning rate 20.0000
| end of split 23 / 28 | epoch 5 | time: 3802.41s | valid loss 1.0735 | valid ppl 2.9255 | learning rate 20.0000
| end of split 24 / 28 | epoch 5 | time: 3803.85s | valid loss 1.0714 | valid ppl 2.9193 | learning rate 20.0000
| end of split 25 / 28 | epoch 5 | time: 3802.20s | valid loss 1.0703 | valid ppl 2.9163 | learning rate 20.0000
| end of split 26 / 28 | epoch 5 | time: 3804.97s | valid loss 1.0696 | valid ppl 2.9142 | learning rate 20.0000
| end of split 27 / 28 | epoch 5 | time: 3805.82s | valid loss 1.0704 | valid ppl 2.9167 | learning rate 20.0000
| end of split 28 / 28 | epoch 5 | time: 3804.59s | valid loss 1.0692 | valid ppl 2.9130 | learning rate 20.0000
| end of split 1 / 28 | epoch 6 | time: 3798.75s | valid loss 1.0703 | valid ppl 2.9162 | learning rate 20.0000
| end of split 2 / 28 | epoch 6 | time: 3801.06s | valid loss 1.0702 | valid ppl 2.9159 | learning rate 20.0000
| end of split 3 / 28 | epoch 6 | time: 3796.51s | valid loss 1.0690 | valid ppl 2.9123 | learning rate 20.0000
| end of split 4 / 28 | epoch 6 | time: 3797.49s | valid loss 1.0686 | valid ppl 2.9114 | learning rate 20.0000
| end of split 5 / 28 | epoch 6 | time: 3802.58s | valid loss 1.0688 | valid ppl 2.9120 | learning rate 20.0000
| end of split 6 / 28 | epoch 6 | time: 3800.26s | valid loss 1.0689 | valid ppl 2.9121 | learning rate 20.0000
| end of split 7 / 28 | epoch 6 | time: 3801.18s | valid loss 1.0683 | valid ppl 2.9103 | learning rate 20.0000
| end of split 8 / 28 | epoch 6 | time: 3805.98s | valid loss 1.0674 | valid ppl 2.9079 | learning rate 20.0000
| end of split 9 / 28 | epoch 6 | time: 3804.26s | valid loss 1.0674 | valid ppl 2.9078 | learning rate 20.0000
| end of split 10 / 28 | epoch 6 | time: 3797.98s | valid loss 1.0696 | valid ppl 2.9143 | learning rate 20.0000
| end of split 11 / 28 | epoch 6 | time: 3801.56s | valid loss 1.0679 | valid ppl 2.9093 | learning rate 20.0000
| end of split 12 / 28 | epoch 6 | time: 3802.48s | valid loss 1.0672 | valid ppl 2.9074 | learning rate 20.0000
| end of split 13 / 28 | epoch 6 | time: 3812.54s | valid loss 1.0673 | valid ppl 2.9076 | learning rate 20.0000
| end of split 14 / 28 | epoch 6 | time: 3816.47s | valid loss 1.0680 | valid ppl 2.9094 | learning rate 20.0000
| end of split 15 / 28 | epoch 6 | time: 3808.34s | valid loss 1.0670 | valid ppl 2.9067 | learning rate 20.0000
| end of split 16 / 28 | epoch 6 | time: 3810.71s | valid loss 1.0668 | valid ppl 2.9062 | learning rate 20.0000
| end of split 17 / 28 | epoch 6 | time: 3811.31s | valid loss 1.0657 | valid ppl 2.9028 | learning rate 20.0000
| end of split 18 / 28 | epoch 6 | time: 3808.51s | valid loss 1.0663 | valid ppl 2.9046 | learning rate 20.0000
| end of split 19 / 28 | epoch 6 | time: 3806.94s | valid loss 1.0660 | valid ppl 2.9039 | learning rate 20.0000
| end of split 20 / 28 | epoch 6 | time: 3804.47s | valid loss 1.0658 | valid ppl 2.9031 | learning rate 20.0000
| end of split 21 / 28 | epoch 6 | time: 3803.28s | valid loss 1.0657 | valid ppl 2.9029 | learning rate 20.0000
| end of split 22 / 28 | epoch 6 | time: 1098.89s | valid loss 1.0650 | valid ppl 2.9009 | learning rate 20.0000
| end of split 23 / 28 | epoch 6 | time: 3801.72s | valid loss 1.0658 | valid ppl 2.9030 | learning rate 20.0000
| end of split 24 / 28 | epoch 6 | time: 3808.12s | valid loss 1.0656 | valid ppl 2.9025 | learning rate 20.0000
| end of split 25 / 28 | epoch 6 | time: 3806.53s | valid loss 1.0679 | valid ppl 2.9094 | learning rate 20.0000
| end of split 26 / 28 | epoch 6 | time: 3800.71s | valid loss 1.0656 | valid ppl 2.9026 | learning rate 20.0000
| end of split 27 / 28 | epoch 6 | time: 3802.33s | valid loss 1.0645 | valid ppl 2.8994 | learning rate 20.0000
| end of split 28 / 28 | epoch 6 | time: 3797.75s | valid loss 1.0645 | valid ppl 2.8994 | learning rate 20.0000
| end of split 1 / 28 | epoch 7 | time: 3800.93s | valid loss 1.0649 | valid ppl 2.9004 | learning rate 20.0000
| end of split 2 / 28 | epoch 7 | time: 3803.64s | valid loss 1.0637 | valid ppl 2.8969 | learning rate 20.0000
| end of split 3 / 28 | epoch 7 | time: 3803.79s | valid loss 1.0636 | valid ppl 2.8968 | learning rate 20.0000
| end of split 4 / 28 | epoch 7 | time: 3805.63s | valid loss 1.0641 | valid ppl 2.8983 | learning rate 20.0000
| end of split 5 / 28 | epoch 7 | time: 3795.80s | valid loss 1.0629 | valid ppl 2.8947 | learning rate 20.0000
| end of split 6 / 28 | epoch 7 | time: 3807.54s | valid loss 1.0630 | valid ppl 2.8950 | learning rate 20.0000
| end of split 7 / 28 | epoch 7 | time: 3804.15s | valid loss 1.0640 | valid ppl 2.8980 | learning rate 20.0000
| end of split 8 / 28 | epoch 7 | time: 3803.94s | valid loss 1.0637 | valid ppl 2.8972 | learning rate 20.0000
| end of split 9 / 28 | epoch 7 | time: 3803.38s | valid loss 1.0634 | valid ppl 2.8962 | learning rate 20.0000
| end of split 10 / 28 | epoch 7 | time: 3806.34s | valid loss 1.0650 | valid ppl 2.9008 | learning rate 20.0000
| end of split 11 / 28 | epoch 7 | time: 1098.92s | valid loss 1.0622 | valid ppl 2.8926 | learning rate 20.0000
| end of split 12 / 28 | epoch 7 | time: 3803.81s | valid loss 1.0622 | valid ppl 2.8926 | learning rate 20.0000
| end of split 13 / 28 | epoch 7 | time: 3806.59s | valid loss 1.0630 | valid ppl 2.8949 | learning rate 20.0000
| end of split 14 / 28 | epoch 7 | time: 3803.04s | valid loss 1.0620 | valid ppl 2.8920 | learning rate 20.0000
| end of split 15 / 28 | epoch 7 | time: 3803.29s | valid loss 1.0619 | valid ppl 2.8920 | learning rate 20.0000
| end of split 16 / 28 | epoch 7 | time: 3802.60s | valid loss 1.0630 | valid ppl 2.8950 | learning rate 20.0000
| end of split 17 / 28 | epoch 7 | time: 3805.28s | valid loss 1.0621 | valid ppl 2.8925 | learning rate 20.0000
| end of split 18 / 28 | epoch 7 | time: 3800.72s | valid loss 1.0616 | valid ppl 2.8910 | learning rate 20.0000
| end of split 19 / 28 | epoch 7 | time: 3801.59s | valid loss 1.0615 | valid ppl 2.8907 | learning rate 20.0000
| end of split 20 / 28 | epoch 7 | time: 3803.04s | valid loss 1.0610 | valid ppl 2.8892 | learning rate 20.0000
| end of split 21 / 28 | epoch 7 | time: 3809.57s | valid loss 1.0597 | valid ppl 2.8855 | learning rate 20.0000
| end of split 22 / 28 | epoch 7 | time: 3802.88s | valid loss 1.0621 | valid ppl 2.8923 | learning rate 20.0000
| end of split 23 / 28 | epoch 7 | time: 3799.92s | valid loss 1.0612 | valid ppl 2.8900 | learning rate 20.0000
| end of split 24 / 28 | epoch 7 | time: 3804.46s | valid loss 1.0615 | valid ppl 2.8907 | learning rate 20.0000
| end of split 25 / 28 | epoch 7 | time: 3798.64s | valid loss 1.0599 | valid ppl 2.8862 | learning rate 20.0000
| end of split 26 / 28 | epoch 7 | time: 3799.12s | valid loss 1.0603 | valid ppl 2.8873 | learning rate 20.0000
| end of split 27 / 28 | epoch 7 | time: 3798.12s | valid loss 1.0606 | valid ppl 2.8880 | learning rate 20.0000
| end of split 28 / 28 | epoch 7 | time: 3805.05s | valid loss 1.0604 | valid ppl 2.8875 | learning rate 20.0000
| end of split 1 / 28 | epoch 8 | time: 3797.40s | valid loss 1.0600 | valid ppl 2.8863 | learning rate 20.0000
| end of split 2 / 28 | epoch 8 | time: 3796.23s | valid loss 1.0608 | valid ppl 2.8886 | learning rate 20.0000
| end of split 3 / 28 | epoch 8 | time: 3797.50s | valid loss 1.0626 | valid ppl 2.8940 | learning rate 20.0000
| end of split 4 / 28 | epoch 8 | time: 3798.81s | valid loss 1.0599 | valid ppl 2.8861 | learning rate 20.0000
| end of split 5 / 28 | epoch 8 | time: 3800.00s | valid loss 1.0562 | valid ppl 2.8756 | learning rate 5.0000
| end of split 6 / 28 | epoch 8 | time: 3806.43s | valid loss 1.0559 | valid ppl 2.8747 | learning rate 5.0000
| end of split 7 / 28 | epoch 8 | time: 3804.50s | valid loss 1.0557 | valid ppl 2.8739 | learning rate 5.0000
| end of split 8 / 28 | epoch 8 | time: 3803.18s | valid loss 1.0555 | valid ppl 2.8735 | learning rate 5.0000
| end of split 9 / 28 | epoch 8 | time: 1098.26s | valid loss 1.0555 | valid ppl 2.8734 | learning rate 5.0000
| end of split 10 / 28 | epoch 8 | time: 3803.32s | valid loss 1.0553 | valid ppl 2.8730 | learning rate 5.0000
| end of split 11 / 28 | epoch 8 | time: 3805.59s | valid loss 1.0553 | valid ppl 2.8728 | learning rate 5.0000
| end of split 12 / 28 | epoch 8 | time: 3798.28s | valid loss 1.0551 | valid ppl 2.8724 | learning rate 5.0000
| end of split 13 / 28 | epoch 8 | time: 3798.22s | valid loss 1.0551 | valid ppl 2.8722 | learning rate 5.0000
| end of split 14 / 28 | epoch 8 | time: 3798.98s | valid loss 1.0550 | valid ppl 2.8720 | learning rate 5.0000
| end of split 15 / 28 | epoch 8 | time: 3796.37s | valid loss 1.0550 | valid ppl 2.8719 | learning rate 5.0000
| end of split 16 / 28 | epoch 8 | time: 3792.33s | valid loss 1.0549 | valid ppl 2.8717 | learning rate 5.0000
| end of split 17 / 28 | epoch 8 | time: 3801.12s | valid loss 1.0548 | valid ppl 2.8715 | learning rate 5.0000
| end of split 18 / 28 | epoch 8 | time: 3803.54s | valid loss 1.0548 | valid ppl 2.8713 | learning rate 5.0000
| end of split 19 / 28 | epoch 8 | time: 3794.99s | valid loss 1.0547 | valid ppl 2.8712 | learning rate 5.0000
| end of split 20 / 28 | epoch 8 | time: 3800.67s | valid loss 1.0546 | valid ppl 2.8709 | learning rate 5.0000
| end of split 21 / 28 | epoch 8 | time: 3802.07s | valid loss 1.0547 | valid ppl 2.8710 | learning rate 5.0000
| end of split 22 / 28 | epoch 8 | time: 3795.63s | valid loss 1.0546 | valid ppl 2.8707 | learning rate 5.0000
| end of split 23 / 28 | epoch 8 | time: 3797.48s | valid loss 1.0545 | valid ppl 2.8705 | learning rate 5.0000
| end of split 24 / 28 | epoch 8 | time: 3826.24s | valid loss 1.0545 | valid ppl 2.8705 | learning rate 5.0000
| end of split 25 / 28 | epoch 8 | time: 3796.29s | valid loss 1.0543 | valid ppl 2.8701 | learning rate 5.0000
| end of split 26 / 28 | epoch 8 | time: 3803.96s | valid loss 1.0545 | valid ppl 2.8705 | learning rate 5.0000
| end of split 27 / 28 | epoch 8 | time: 3802.34s | valid loss 1.0543 | valid ppl 2.8700 | learning rate 5.0000
| end of split 28 / 28 | epoch 8 | time: 3803.96s | valid loss 1.0543 | valid ppl 2.8699 | learning rate 5.0000
| end of split 1 / 28 | epoch 9 | time: 3798.65s | valid loss 1.0542 | valid ppl 2.8697 | learning rate 5.0000
| end of split 2 / 28 | epoch 9 | time: 3801.55s | valid loss 1.0542 | valid ppl 2.8696 | learning rate 5.0000
| end of split 3 / 28 | epoch 9 | time: 3806.56s | valid loss 1.0541 | valid ppl 2.8693 | learning rate 5.0000
| end of split 4 / 28 | epoch 9 | time: 3801.41s | valid loss 1.0541 | valid ppl 2.8695 | learning rate 5.0000
| end of split 5 / 28 | epoch 9 | time: 3799.18s | valid loss 1.0540 | valid ppl 2.8692 | learning rate 5.0000
| end of split 6 / 28 | epoch 9 | time: 3801.41s | valid loss 1.0540 | valid ppl 2.8690 | learning rate 5.0000
| end of split 7 / 28 | epoch 9 | time: 3792.65s | valid loss 1.0539 | valid ppl 2.8687 | learning rate 5.0000
| end of split 8 / 28 | epoch 9 | time: 3801.50s | valid loss 1.0539 | valid ppl 2.8688 | learning rate 5.0000
| end of split 9 / 28 | epoch 9 | time: 3799.22s | valid loss 1.0539 | valid ppl 2.8689 | learning rate 5.0000
| end of split 10 / 28 | epoch 9 | time: 3798.30s | valid loss 1.0537 | valid ppl 2.8683 | learning rate 5.0000
| end of split 11 / 28 | epoch 9 | time: 3794.81s | valid loss 1.0537 | valid ppl 2.8682 | learning rate 5.0000
| end of split 12 / 28 | epoch 9 | time: 3794.04s | valid loss 1.0537 | valid ppl 2.8682 | learning rate 5.0000
| end of split 13 / 28 | epoch 9 | time: 3798.63s | valid loss 1.0537 | valid ppl 2.8683 | learning rate 5.0000
| end of split 14 / 28 | epoch 9 | time: 3797.90s | valid loss 1.0535 | valid ppl 2.8678 | learning rate 5.0000
| end of split 15 / 28 | epoch 9 | time: 3796.44s | valid loss 1.0536 | valid ppl 2.8680 | learning rate 5.0000
| end of split 16 / 28 | epoch 9 | time: 3798.41s | valid loss 1.0536 | valid ppl 2.8678 | learning rate 5.0000
| end of split 17 / 28 | epoch 9 | time: 3799.93s | valid loss 1.0535 | valid ppl 2.8676 | learning rate 5.0000
| end of split 18 / 28 | epoch 9 | time: 3803.40s | valid loss 1.0534 | valid ppl 2.8673 | learning rate 5.0000
| end of split 19 / 28 | epoch 9 | time: 3807.52s | valid loss 1.0537 | valid ppl 2.8683 | learning rate 5.0000
| end of split 20 / 28 | epoch 9 | time: 3807.58s | valid loss 1.0534 | valid ppl 2.8673 | learning rate 5.0000
| end of split 21 / 28 | epoch 9 | time: 3799.18s | valid loss 1.0533 | valid ppl 2.8672 | learning rate 5.0000
| end of split 22 / 28 | epoch 9 | time: 3800.62s | valid loss 1.0532 | valid ppl 2.8668 | learning rate 5.0000
| end of split 23 / 28 | epoch 9 | time: 3796.79s | valid loss 1.0532 | valid ppl 2.8667 | learning rate 5.0000
| end of split 24 / 28 | epoch 9 | time: 1097.06s | valid loss 1.0532 | valid ppl 2.8669 | learning rate 5.0000
| end of split 25 / 28 | epoch 9 | time: 3795.86s | valid loss 1.0532 | valid ppl 2.8669 | learning rate 5.0000
| end of split 26 / 28 | epoch 9 | time: 3803.14s | valid loss 1.0531 | valid ppl 2.8665 | learning rate 5.0000
| end of split 27 / 28 | epoch 9 | time: 3798.92s | valid loss 1.0530 | valid ppl 2.8663 | learning rate 5.0000
| end of split 28 / 28 | epoch 9 | time: 3799.90s | valid loss 1.0530 | valid ppl 2.8663 | learning rate 5.0000
| end of split 1 / 28 | epoch 10 | time: 3798.57s | valid loss 1.0530 | valid ppl 2.8662 | learning rate 5.0000
| end of split 2 / 28 | epoch 10 | time: 3798.13s | valid loss 1.0529 | valid ppl 2.8661 | learning rate 5.0000
| end of split 3 / 28 | epoch 10 | time: 3799.82s | valid loss 1.0530 | valid ppl 2.8662 | learning rate 5.0000
| end of split 4 / 28 | epoch 10 | time: 3802.23s | valid loss 1.0529 | valid ppl 2.8659 | learning rate 5.0000
| end of split 5 / 28 | epoch 10 | time: 3801.56s | valid loss 1.0529 | valid ppl 2.8660 | learning rate 5.0000
| end of split 6 / 28 | epoch 10 | time: 3798.08s | valid loss 1.0528 | valid ppl 2.8656 | learning rate 5.0000
| end of split 7 / 28 | epoch 10 | time: 3800.12s | valid loss 1.0528 | valid ppl 2.8656 | learning rate 5.0000
| end of split 8 / 28 | epoch 10 | time: 3800.94s | valid loss 1.0526 | valid ppl 2.8652 | learning rate 5.0000
| end of split 9 / 28 | epoch 10 | time: 3801.43s | valid loss 1.0529 | valid ppl 2.8659 | learning rate 5.0000
| end of split 10 / 28 | epoch 10 | time: 3798.47s | valid loss 1.0526 | valid ppl 2.8652 | learning rate 5.0000
| end of split 11 / 28 | epoch 10 | time: 3803.15s | valid loss 1.0526 | valid ppl 2.8650 | learning rate 5.0000
| end of split 12 / 28 | epoch 10 | time: 3800.32s | valid loss 1.0526 | valid ppl 2.8650 | learning rate 5.0000
| end of split 13 / 28 | epoch 10 | time: 3802.61s | valid loss 1.0525 | valid ppl 2.8647 | learning rate 5.0000
| end of split 14 / 28 | epoch 10 | time: 3799.08s | valid loss 1.0525 | valid ppl 2.8648 | learning rate 5.0000
| end of split 15 / 28 | epoch 10 | time: 3801.19s | valid loss 1.0525 | valid ppl 2.8647 | learning rate 5.0000
| end of split 16 / 28 | epoch 10 | time: 3801.20s | valid loss 1.0524 | valid ppl 2.8646 | learning rate 5.0000
| end of split 17 / 28 | epoch 10 | time: 3802.37s | valid loss 1.0524 | valid ppl 2.8645 | learning rate 5.0000
| end of split 18 / 28 | epoch 10 | time: 3805.85s | valid loss 1.0523 | valid ppl 2.8643 | learning rate 5.0000
| end of split 19 / 28 | epoch 10 | time: 3804.15s | valid loss 1.0524 | valid ppl 2.8644 | learning rate 5.0000
| end of split 20 / 28 | epoch 10 | time: 3806.41s | valid loss 1.0523 | valid ppl 2.8642 | learning rate 5.0000
| end of split 21 / 28 | epoch 10 | time: 3809.13s | valid loss 1.0522 | valid ppl 2.8639 | learning rate 5.0000
| end of split 22 / 28 | epoch 10 | time: 3798.99s | valid loss 1.0523 | valid ppl 2.8641 | learning rate 5.0000
| end of split 23 / 28 | epoch 10 | time: 3802.76s | valid loss 1.0522 | valid ppl 2.8639 | learning rate 5.0000
| end of split 24 / 28 | epoch 10 | time: 3805.95s | valid loss 1.0522 | valid ppl 2.8639 | learning rate 5.0000
| end of split 25 / 28 | epoch 10 | time: 3803.67s | valid loss 1.0522 | valid ppl 2.8639 | learning rate 5.0000
| end of split 26 / 28 | epoch 10 | time: 3802.75s | valid loss 1.0521 | valid ppl 2.8635 | learning rate 5.0000
| end of split 27 / 28 | epoch 10 | time: 3804.63s | valid loss 1.0520 | valid ppl 2.8633 | learning rate 5.0000
| end of split 28 / 28 | epoch 10 | time: 1097.97s | valid loss 1.0520 | valid ppl 2.8634 | learning rate 5.0000
| end of split 1 / 28 | epoch 11 | time: 3793.51s | valid loss 1.0520 | valid ppl 2.8634 | learning rate 5.0000
| end of split 2 / 28 | epoch 11 | time: 3802.15s | valid loss 1.0520 | valid ppl 2.8633 | learning rate 5.0000
| end of split 3 / 28 | epoch 11 | time: 3801.09s | valid loss 1.0518 | valid ppl 2.8629 | learning rate 5.0000
| end of split 4 / 28 | epoch 11 | time: 3803.88s | valid loss 1.0518 | valid ppl 2.8629 | learning rate 5.0000
| end of split 5 / 28 | epoch 11 | time: 3803.72s | valid loss 1.0518 | valid ppl 2.8628 | learning rate 5.0000
| end of split 6 / 28 | epoch 11 | time: 3803.50s | valid loss 1.0518 | valid ppl 2.8629 | learning rate 5.0000
| end of split 7 / 28 | epoch 11 | time: 3798.93s | valid loss 1.0518 | valid ppl 2.8627 | learning rate 5.0000
| end of split 8 / 28 | epoch 11 | time: 3798.59s | valid loss 1.0516 | valid ppl 2.8623 | learning rate 5.0000
| end of split 9 / 28 | epoch 11 | time: 3797.52s | valid loss 1.0517 | valid ppl 2.8624 | learning rate 5.0000
| end of split 10 / 28 | epoch 11 | time: 3806.92s | valid loss 1.0518 | valid ppl 2.8627 | learning rate 5.0000
| end of split 11 / 28 | epoch 11 | time: 3806.04s | valid loss 1.0516 | valid ppl 2.8622 | learning rate 5.0000
| end of split 12 / 28 | epoch 11 | time: 3801.39s | valid loss 1.0519 | valid ppl 2.8632 | learning rate 5.0000
| end of split 13 / 28 | epoch 11 | time: 3801.24s | valid loss 1.0516 | valid ppl 2.8622 | learning rate 5.0000
| end of split 14 / 28 | epoch 11 | time: 3804.44s | valid loss 1.0515 | valid ppl 2.8620 | learning rate 5.0000
| end of split 15 / 28 | epoch 11 | time: 3801.34s | valid loss 1.0515 | valid ppl 2.8620 | learning rate 5.0000
| end of split 16 / 28 | epoch 11 | time: 3803.14s | valid loss 1.0514 | valid ppl 2.8618 | learning rate 5.0000
| end of split 17 / 28 | epoch 11 | time: 3801.11s | valid loss 1.0514 | valid ppl 2.8617 | learning rate 5.0000
| end of split 18 / 28 | epoch 11 | time: 3804.58s | valid loss 1.0513 | valid ppl 2.8613 | learning rate 5.0000
| end of split 19 / 28 | epoch 11 | time: 3796.04s | valid loss 1.0513 | valid ppl 2.8615 | learning rate 5.0000
| end of split 20 / 28 | epoch 11 | time: 3797.12s | valid loss 1.0512 | valid ppl 2.8611 | learning rate 5.0000
| end of split 21 / 28 | epoch 11 | time: 1097.96s | valid loss 1.0512 | valid ppl 2.8612 | learning rate 5.0000
| end of split 22 / 28 | epoch 11 | time: 3800.79s | valid loss 1.0513 | valid ppl 2.8613 | learning rate 5.0000
| end of split 23 / 28 | epoch 11 | time: 3801.51s | valid loss 1.0518 | valid ppl 2.8629 | learning rate 5.0000
| end of split 24 / 28 | epoch 11 | time: 3798.63s | valid loss 1.0513 | valid ppl 2.8614 | learning rate 5.0000
| end of split 25 / 28 | epoch 11 | time: 3796.99s | valid loss 1.0512 | valid ppl 2.8612 | learning rate 5.0000
| end of split 26 / 28 | epoch 11 | time: 3797.77s | valid loss 1.0512 | valid ppl 2.8610 | learning rate 5.0000
| end of split 27 / 28 | epoch 11 | time: 3797.73s | valid loss 1.0512 | valid ppl 2.8610 | learning rate 5.0000
| end of split 28 / 28 | epoch 11 | time: 3800.03s | valid loss 1.0511 | valid ppl 2.8607 | learning rate 5.0000
| end of split 1 / 28 | epoch 12 | time: 3796.72s | valid loss 1.0511 | valid ppl 2.8609 | learning rate 5.0000
| end of split 2 / 28 | epoch 12 | time: 1097.45s | valid loss 1.0510 | valid ppl 2.8604 | learning rate 5.0000
| end of split 3 / 28 | epoch 12 | time: 3803.10s | valid loss 1.0510 | valid ppl 2.8606 | learning rate 5.0000
| end of split 4 / 28 | epoch 12 | time: 3803.38s | valid loss 1.0510 | valid ppl 2.8604 | learning rate 5.0000
| end of split 5 / 28 | epoch 12 | time: 3796.86s | valid loss 1.0509 | valid ppl 2.8602 | learning rate 5.0000
| end of split 6 / 28 | epoch 12 | time: 3804.85s | valid loss 1.0509 | valid ppl 2.8601 | learning rate 5.0000
| end of split 7 / 28 | epoch 12 | time: 3804.65s | valid loss 1.0509 | valid ppl 2.8601 | learning rate 5.0000
| end of split 8 / 28 | epoch 12 | time: 3806.75s | valid loss 1.0508 | valid ppl 2.8599 | learning rate 5.0000
| end of split 9 / 28 | epoch 12 | time: 3800.05s | valid loss 1.0507 | valid ppl 2.8597 | learning rate 5.0000
| end of split 10 / 28 | epoch 12 | time: 3802.67s | valid loss 1.0507 | valid ppl 2.8596 | learning rate 5.0000
| end of split 11 / 28 | epoch 12 | time: 3806.56s | valid loss 1.0508 | valid ppl 2.8598 | learning rate 5.0000
| end of split 12 / 28 | epoch 12 | time: 3804.49s | valid loss 1.0507 | valid ppl 2.8598 | learning rate 5.0000
| end of split 13 / 28 | epoch 12 | time: 3804.60s | valid loss 1.0507 | valid ppl 2.8595 | learning rate 5.0000
| end of split 14 / 28 | epoch 12 | time: 3799.49s | valid loss 1.0506 | valid ppl 2.8594 | learning rate 5.0000
| end of split 15 / 28 | epoch 12 | time: 3807.23s | valid loss 1.0506 | valid ppl 2.8595 | learning rate 5.0000
| end of split 16 / 28 | epoch 12 | time: 3798.38s | valid loss 1.0506 | valid ppl 2.8592 | learning rate 5.0000
| end of split 17 / 28 | epoch 12 | time: 3806.09s | valid loss 1.0506 | valid ppl 2.8595 | learning rate 5.0000
| end of split 18 / 28 | epoch 12 | time: 3797.37s | valid loss 1.0506 | valid ppl 2.8594 | learning rate 5.0000
| end of split 19 / 28 | epoch 12 | time: 3800.94s | valid loss 1.0505 | valid ppl 2.8589 | learning rate 5.0000
| end of split 20 / 28 | epoch 12 | time: 3796.71s | valid loss 1.0505 | valid ppl 2.8590 | learning rate 5.0000
| end of split 21 / 28 | epoch 12 | time: 3795.95s | valid loss 1.0504 | valid ppl 2.8588 | learning rate 5.0000
| end of split 22 / 28 | epoch 12 | time: 3793.39s | valid loss 1.0504 | valid ppl 2.8588 | learning rate 5.0000
| end of split 23 / 28 | epoch 12 | time: 3797.13s | valid loss 1.0503 | valid ppl 2.8586 | learning rate 5.0000
| end of split 24 / 28 | epoch 12 | time: 3802.93s | valid loss 1.0503 | valid ppl 2.8586 | learning rate 5.0000
| end of split 25 / 28 | epoch 12 | time: 3798.55s | valid loss 1.0502 | valid ppl 2.8582 | learning rate 5.0000
| end of split 26 / 28 | epoch 12 | time: 3797.73s | valid loss 1.0502 | valid ppl 2.8582 | learning rate 5.0000
| end of split 27 / 28 | epoch 12 | time: 3798.53s | valid loss 1.0502 | valid ppl 2.8582 | learning rate 5.0000
| end of split 28 / 28 | epoch 12 | time: 3797.17s | valid loss 1.0502 | valid ppl 2.8582 | learning rate 5.0000
| end of split 29 / 28 | epoch 12 | time: 3798.57s | valid loss 1.0501 | valid ppl 2.8581 | learning rate 5.0000
| end of split 30 / 28 | epoch 12 | time: 3779.22s | valid loss 1.0500 | valid ppl 2.8577 | learning rate 5.0000
| end of split 31 / 28 | epoch 12 | time: 3780.51s | valid loss 1.0500 | valid ppl 2.8576 | learning rate 5.0000
| end of split 32 / 28 | epoch 12 | time: 3779.10s | valid loss 1.0500 | valid ppl 2.8576 | learning rate 5.0000
| end of split 33 / 28 | epoch 12 | time: 1096.89s | valid loss 1.0500 | valid ppl 2.8576 | learning rate 5.0000
| end of split 34 / 28 | epoch 12 | time: 3777.57s | valid loss 1.0499 | valid ppl 2.8574 | learning rate 5.0000
| end of split 35 / 28 | epoch 12 | time: 3779.50s | valid loss 1.0501 | valid ppl 2.8581 | learning rate 5.0000
| end of split 36 / 28 | epoch 12 | time: 3782.16s | valid loss 1.0499 | valid ppl 2.8573 | learning rate 5.0000
| end of split 37 / 28 | epoch 12 | time: 3777.44s | valid loss 1.0498 | valid ppl 2.8572 | learning rate 5.0000
| end of split 38 / 28 | epoch 12 | time: 3777.04s | valid loss 1.0499 | valid ppl 2.8573 | learning rate 5.0000
| end of split 39 / 28 | epoch 12 | time: 3774.81s | valid loss 1.0501 | valid ppl 2.8580 | learning rate 5.0000
| end of split 40 / 28 | epoch 12 | time: 3775.55s | valid loss 1.0498 | valid ppl 2.8570 | learning rate 5.0000
| end of split 41 / 28 | epoch 12 | time: 3780.06s | valid loss 1.0498 | valid ppl 2.8569 | learning rate 5.0000
| end of split 42 / 28 | epoch 12 | time: 3781.04s | valid loss 1.0497 | valid ppl 2.8567 | learning rate 5.0000
| end of split 43 / 28 | epoch 12 | time: 3778.87s | valid loss 1.0496 | valid ppl 2.8565 | learning rate 5.0000
| end of split 44 / 28 | epoch 12 | time: 3778.19s | valid loss 1.0496 | valid ppl 2.8566 | learning rate 5.0000
| end of split 45 / 28 | epoch 12 | time: 3780.17s | valid loss 1.0496 | valid ppl 2.8565 | learning rate 5.0000
| end of split 46 / 28 | epoch 12 | time: 3778.47s | valid loss 1.0496 | valid ppl 2.8564 | learning rate 5.0000
| end of split 47 / 28 | epoch 12 | time: 3780.63s | valid loss 1.0495 | valid ppl 2.8563 | learning rate 5.0000
| end of split 48 / 28 | epoch 12 | time: 3783.64s | valid loss 1.0495 | valid ppl 2.8563 | learning rate 5.0000
| end of split 49 / 28 | epoch 12 | time: 3783.57s | valid loss 1.0496 | valid ppl 2.8566 | learning rate 5.0000
| end of split 50 / 28 | epoch 12 | time: 3781.86s | valid loss 1.0495 | valid ppl 2.8562 | learning rate 5.0000
| end of split 51 / 28 | epoch 12 | time: 3785.70s | valid loss 1.0494 | valid ppl 2.8559 | learning rate 5.0000
| end of split 52 / 28 | epoch 12 | time: 3789.97s | valid loss 1.0494 | valid ppl 2.8559 | learning rate 5.0000
| end of split 53 / 28 | epoch 12 | time: 3790.93s | valid loss 1.0494 | valid ppl 2.8559 | learning rate 5.0000
| end of split 54 / 28 | epoch 12 | time: 3809.97s | valid loss 1.0493 | valid ppl 2.8558 | learning rate 5.0000
| end of split 55 / 28 | epoch 12 | time: 3815.38s | valid loss 1.0494 | valid ppl 2.8559 | learning rate 5.0000
| end of split 56 / 28 | epoch 12 | time: 3823.98s | valid loss 1.0492 | valid ppl 2.8554 | learning rate 5.0000
| end of split 57 / 28 | epoch 12 | time: 3821.56s | valid loss 1.0493 | valid ppl 2.8556 | learning rate 5.0000
| end of split 30 / 28 | epoch 13 | time: 3781.69s | valid loss 1.0493 | valid ppl 2.8558 | learning rate 5.0000
| end of split 31 / 28 | epoch 13 | time: 3822.49s | valid loss 1.0492 | valid ppl 2.8552 | learning rate 5.0000
| end of split 32 / 28 | epoch 13 | time: 3826.45s | valid loss 1.0491 | valid ppl 2.8552 | learning rate 5.0000
| end of split 33 / 28 | epoch 13 | time: 3825.81s | valid loss 1.0491 | valid ppl 2.8550 | learning rate 5.0000
| end of split 34 / 28 | epoch 13 | time: 3825.85s | valid loss 1.0490 | valid ppl 2.8549 | learning rate 5.0000
| end of split 35 / 28 | epoch 13 | time: 3805.21s | valid loss 1.0491 | valid ppl 2.8551 | learning rate 5.0000
| end of split 36 / 28 | epoch 13 | time: 3833.10s | valid loss 1.0490 | valid ppl 2.8547 | learning rate 5.0000
| end of split 37 / 28 | epoch 13 | time: 3790.99s | valid loss 1.0489 | valid ppl 2.8545 | learning rate 5.0000
| end of split 38 / 28 | epoch 13 | time: 3794.98s | valid loss 1.0490 | valid ppl 2.8547 | learning rate 5.0000
| end of split 39 / 28 | epoch 13 | time: 3794.00s | valid loss 1.0490 | valid ppl 2.8547 | learning rate 5.0000
| end of split 40 / 28 | epoch 13 | time: 3781.45s | valid loss 1.0490 | valid ppl 2.8547 | learning rate 5.0000
| end of split 41 / 28 | epoch 13 | time: 1101.74s | valid loss 1.0489 | valid ppl 2.8546 | learning rate 5.0000
| end of split 42 / 28 | epoch 13 | time: 3800.72s | valid loss 1.0489 | valid ppl 2.8546 | learning rate 5.0000
| end of split 43 / 28 | epoch 13 | time: 3797.95s | valid loss 1.0489 | valid ppl 2.8544 | learning rate 5.0000
| end of split 44 / 28 | epoch 13 | time: 3801.59s | valid loss 1.0489 | valid ppl 2.8544 | learning rate 5.0000
| end of split 45 / 28 | epoch 13 | time: 3798.16s | valid loss 1.0488 | valid ppl 2.8542 | learning rate 5.0000
| end of split 46 / 28 | epoch 13 | time: 3838.78s | valid loss 1.0488 | valid ppl 2.8542 | learning rate 5.0000
| end of split 47 / 28 | epoch 13 | time: 3839.26s | valid loss 1.0487 | valid ppl 2.8540 | learning rate 5.0000
| end of split 48 / 28 | epoch 13 | time: 3795.07s | valid loss 1.0487 | valid ppl 2.8540 | learning rate 5.0000
| end of split 49 / 28 | epoch 13 | time: 3838.27s | valid loss 1.0487 | valid ppl 2.8539 | learning rate 5.0000
| end of split 50 / 28 | epoch 13 | time: 3797.61s | valid loss 1.0487 | valid ppl 2.8538 | learning rate 5.0000
| end of split 51 / 28 | epoch 13 | time: 3798.41s | valid loss 1.0487 | valid ppl 2.8539 | learning rate 5.0000
| end of split 52 / 28 | epoch 13 | time: 3799.35s | valid loss 1.0488 | valid ppl 2.8543 | learning rate 5.0000
| end of split 53 / 28 | epoch 13 | time: 3801.21s | valid loss 1.0488 | valid ppl 2.8544 | learning rate 5.0000
| end of split 54 / 28 | epoch 13 | time: 3797.20s | valid loss 1.0486 | valid ppl 2.8537 | learning rate 5.0000
| end of split 55 / 28 | epoch 13 | time: 3798.03s | valid loss 1.0486 | valid ppl 2.8536 | learning rate 5.0000
| end of split 56 / 28 | epoch 13 | time: 33927.42s | valid loss 1.0486 | valid ppl 2.8536 | learning rate 5.0000
| end of split 57 / 28 | epoch 13 | time: 3775.50s | valid loss 1.0485 | valid ppl 2.8533 | learning rate 5.0000
| end of split 58 / 28 | epoch 13 | time: 3783.03s | valid loss 1.0485 | valid ppl 2.8532 | learning rate 5.0000
| end of split 59 / 28 | epoch 13 | time: 3779.76s | valid loss 1.0485 | valid ppl 2.8535 | learning rate 5.0000
| end of split 60 / 28 | epoch 13 | time: 3785.13s | valid loss 1.0484 | valid ppl 2.8530 | learning rate 5.0000
| end of split 61 / 28 | epoch 13 | time: 3783.81s | valid loss 1.0485 | valid ppl 2.8533 | learning rate 5.0000
| end of split 62 / 28 | epoch 13 | time: 3780.75s | valid loss 1.0484 | valid ppl 2.8530 | learning rate 5.0000
| end of split 63 / 28 | epoch 13 | time: 3780.18s | valid loss 1.0484 | valid ppl 2.8530 | learning rate 5.0000
| end of split 64 / 28 | epoch 13 | time: 1095.57s | valid loss 1.0483 | valid ppl 2.8529 | learning rate 5.0000
| end of split 65 / 28 | epoch 13 | time: 3781.51s | valid loss 1.0483 | valid ppl 2.8528 | learning rate 5.0000
| end of split 66 / 28 | epoch 13 | time: 3779.93s | valid loss 1.0483 | valid ppl 2.8527 | learning rate 5.0000
| end of split 67 / 28 | epoch 13 | time: 3780.02s | valid loss 1.0482 | valid ppl 2.8526 | learning rate 5.0000
| end of split 68 / 28 | epoch 13 | time: 3779.33s | valid loss 1.0481 | valid ppl 2.8522 | learning rate 5.0000
| end of split 69 / 28 | epoch 13 | time: 3780.57s | valid loss 1.0482 | valid ppl 2.8526 | learning rate 5.0000
| end of split 70 / 28 | epoch 13 | time: 3781.85s | valid loss 1.0481 | valid ppl 2.8522 | learning rate 5.0000
| end of split 71 / 28 | epoch 13 | time: 26921.31s | valid loss 1.0481 | valid ppl 2.8523 | learning rate 5.0000
| end of split 72 / 28 | epoch 13 | time: 3773.62s | valid loss 1.0481 | valid ppl 2.8522 | learning rate 5.0000
| end of split 73 / 28 | epoch 13 | time: 3781.75s | valid loss 1.0480 | valid ppl 2.8521 | learning rate 5.0000
| end of split 74 / 28 | epoch 13 | time: 3781.92s | valid loss 1.0479 | valid ppl 2.8518 | learning rate 5.0000
| end of split 75 / 28 | epoch 13 | time: 3775.75s | valid loss 1.0480 | valid ppl 2.8519 | learning rate 5.0000
| end of split 48 / 28 | epoch 14 | time: 3775.65s | valid loss 1.0480 | valid ppl 2.8520 | learning rate 5.0000
| end of split 49 / 28 | epoch 14 | time: 3781.53s | valid loss 1.0479 | valid ppl 2.8516 | learning rate 5.0000
| end of split 50 / 28 | epoch 14 | time: 3781.74s | valid loss 1.0479 | valid ppl 2.8516 | learning rate 5.0000
| end of split 51 / 28 | epoch 14 | time: 3780.09s | valid loss 1.0479 | valid ppl 2.8515 | learning rate 5.0000
| end of split 52 / 28 | epoch 14 | time: 3774.99s | valid loss 1.0478 | valid ppl 2.8514 | learning rate 5.0000
| end of split 53 / 28 | epoch 14 | time: 3773.17s | valid loss 1.0478 | valid ppl 2.8514 | learning rate 5.0000
| end of split 54 / 28 | epoch 14 | time: 3782.51s | valid loss 1.0478 | valid ppl 2.8514 | learning rate 5.0000
| end of split 55 / 28 | epoch 14 | time: 1091.47s | valid loss 1.0478 | valid ppl 2.8513 | learning rate 5.0000
| end of split 56 / 28 | epoch 14 | time: 3770.65s | valid loss 1.0477 | valid ppl 2.8512 | learning rate 5.0000
| end of split 57 / 28 | epoch 14 | time: 3772.52s | valid loss 1.0478 | valid ppl 2.8515 | learning rate 5.0000
| end of split 58 / 28 | epoch 14 | time: 3779.35s | valid loss 1.0477 | valid ppl 2.8512 | learning rate 5.0000
| end of split 59 / 28 | epoch 14 | time: 3781.23s | valid loss 1.0478 | valid ppl 2.8513 | learning rate 5.0000
| end of split 60 / 28 | epoch 14 | time: 3776.28s | valid loss 1.0477 | valid ppl 2.8511 | learning rate 5.0000
| end of split 61 / 28 | epoch 14 | time: 3776.23s | valid loss 1.0477 | valid ppl 2.8510 | learning rate 5.0000
| end of split 62 / 28 | epoch 14 | time: 3780.36s | valid loss 1.0476 | valid ppl 2.8509 | learning rate 5.0000
| end of split 63 / 28 | epoch 14 | time: 3778.91s | valid loss 1.0476 | valid ppl 2.8509 | learning rate 5.0000
| end of split 64 / 28 | epoch 14 | time: 3778.32s | valid loss 1.0475 | valid ppl 2.8504 | learning rate 5.0000
| end of split 65 / 28 | epoch 14 | time: 3779.46s | valid loss 1.0476 | valid ppl 2.8508 | learning rate 5.0000
| end of split 66 / 28 | epoch 14 | time: 3776.90s | valid loss 1.0476 | valid ppl 2.8507 | learning rate 5.0000
| end of split 67 / 28 | epoch 14 | time: 3779.74s | valid loss 1.0475 | valid ppl 2.8505 | learning rate 5.0000
| end of split 68 / 28 | epoch 14 | time: 3783.67s | valid loss 1.0475 | valid ppl 2.8506 | learning rate 5.0000
| end of split 69 / 28 | epoch 14 | time: 3779.38s | valid loss 1.0474 | valid ppl 2.8503 | learning rate 5.0000
| end of split 70 / 28 | epoch 14 | time: 3779.94s | valid loss 1.0473 | valid ppl 2.8499 | learning rate 5.0000
| end of split 71 / 28 | epoch 14 | time: 3778.53s | valid loss 1.0474 | valid ppl 2.8503 | learning rate 5.0000
| end of split 72 / 28 | epoch 14 | time: 3776.87s | valid loss 1.0473 | valid ppl 2.8499 | learning rate 5.0000
| end of split 73 / 28 | epoch 14 | time: 3781.07s | valid loss 1.0472 | valid ppl 2.8498 | learning rate 5.0000
| end of split 74 / 28 | epoch 14 | time: 3780.24s | valid loss 1.0472 | valid ppl 2.8498 | learning rate 5.0000
| end of split 75 / 28 | epoch 14 | time: 3783.57s | valid loss 1.0473 | valid ppl 2.8499 | learning rate 5.0000
| end of split 48 / 28 | epoch 15 | time: 3773.51s | valid loss 1.0473 | valid ppl 2.8500 | learning rate 5.0000
| end of split 49 / 28 | epoch 15 | time: 3776.38s | valid loss 1.0472 | valid ppl 2.8497 | learning rate 5.0000
| end of split 50 / 28 | epoch 15 | time: 3780.53s | valid loss 1.0472 | valid ppl 2.8496 | learning rate 5.0000
| end of split 51 / 28 | epoch 15 | time: 3773.64s | valid loss 1.0472 | valid ppl 2.8497 | learning rate 5.0000
| end of split 52 / 28 | epoch 15 | time: 3776.44s | valid loss 1.0471 | valid ppl 2.8494 | learning rate 5.0000
| end of split 53 / 28 | epoch 15 | time: 3777.32s | valid loss 1.0470 | valid ppl 2.8491 | learning rate 5.0000
| end of split 54 / 28 | epoch 15 | time: 3780.27s | valid loss 1.0471 | valid ppl 2.8495 | learning rate 5.0000
| end of split 55 / 28 | epoch 15 | time: 3780.34s | valid loss 1.0471 | valid ppl 2.8492 | learning rate 5.0000
| end of split 56 / 28 | epoch 15 | time: 3779.87s | valid loss 1.0470 | valid ppl 2.8490 | learning rate 5.0000
| end of split 57 / 28 | epoch 15 | time: 3776.07s | valid loss 1.0470 | valid ppl 2.8492 | learning rate 5.0000
| end of split 58 / 28 | epoch 15 | time: 3777.39s | valid loss 1.0469 | valid ppl 2.8488 | learning rate 5.0000
| end of split 59 / 28 | epoch 15 | time: 3779.77s | valid loss 1.0470 | valid ppl 2.8491 | learning rate 5.0000
| end of split 60 / 28 | epoch 15 | time: 3780.19s | valid loss 1.0470 | valid ppl 2.8492 | learning rate 5.0000
| end of split 61 / 28 | epoch 15 | time: 3781.19s | valid loss 1.0469 | valid ppl 2.8489 | learning rate 5.0000
| end of split 62 / 28 | epoch 15 | time: 3778.56s | valid loss 1.0470 | valid ppl 2.8490 | learning rate 5.0000
| end of split 63 / 28 | epoch 15 | time: 3779.68s | valid loss 1.0469 | valid ppl 2.8488 | learning rate 5.0000
| end of split 64 / 28 | epoch 15 | time: 1092.12s | valid loss 1.0469 | valid ppl 2.8487 | learning rate 5.0000
| end of split 65 / 28 | epoch 15 | time: 3781.19s | valid loss 1.0468 | valid ppl 2.8484 | learning rate 5.0000
| end of split 66 / 28 | epoch 15 | time: 3782.54s | valid loss 1.0472 | valid ppl 2.8498 | learning rate 5.0000
| end of split 67 / 28 | epoch 15 | time: 3782.53s | valid loss 1.0469 | valid ppl 2.8487 | learning rate 5.0000
| end of split 68 / 28 | epoch 15 | time: 3782.19s | valid loss 1.0467 | valid ppl 2.8484 | learning rate 5.0000
| end of split 69 / 28 | epoch 15 | time: 3782.17s | valid loss 1.0468 | valid ppl 2.8485 | learning rate 5.0000
| end of split 70 / 28 | epoch 15 | time: 3784.02s | valid loss 1.0466 | valid ppl 2.8481 | learning rate 5.0000
| end of split 71 / 28 | epoch 15 | time: 28702.54s | valid loss 1.0466 | valid ppl 2.8480 | learning rate 5.0000
| end of split 72 / 28 | epoch 15 | time: 3784.82s | valid loss 1.0466 | valid ppl 2.8479 | learning rate 5.0000
| end of split 73 / 28 | epoch 15 | time: 3783.80s | valid loss 1.0466 | valid ppl 2.8479 | learning rate 5.0000
| end of split 74 / 28 | epoch 15 | time: 3783.55s | valid loss 1.0465 | valid ppl 2.8477 | learning rate 5.0000
| end of split 75 / 28 | epoch 15 | time: 3782.97s | valid loss 1.0465 | valid ppl 2.8476 | learning rate 5.0000
| end of split 48 / 28 | epoch 16 | time: 3773.65s | valid loss 1.0465 | valid ppl 2.8478 | learning rate 5.0000
| end of split 49 / 28 | epoch 16 | time: 3775.38s | valid loss 1.0466 | valid ppl 2.8479 | learning rate 5.0000
| end of split 50 / 28 | epoch 16 | time: 1091.65s | valid loss 1.0464 | valid ppl 2.8473 | learning rate 5.0000
| end of split 51 / 28 | epoch 16 | time: 3783.99s | valid loss 1.0465 | valid ppl 2.8477 | learning rate 5.0000
| end of split 52 / 28 | epoch 16 | time: 3780.59s | valid loss 1.0465 | valid ppl 2.8476 | learning rate 5.0000
| end of split 53 / 28 | epoch 16 | time: 3784.95s | valid loss 1.0463 | valid ppl 2.8472 | learning rate 5.0000
| end of split 54 / 28 | epoch 16 | time: 3782.59s | valid loss 1.0463 | valid ppl 2.8471 | learning rate 5.0000
| end of split 55 / 28 | epoch 16 | time: 3778.11s | valid loss 1.0463 | valid ppl 2.8472 | learning rate 5.0000
| end of split 56 / 28 | epoch 16 | time: 3779.94s | valid loss 1.0464 | valid ppl 2.8474 | learning rate 5.0000
| end of split 57 / 28 | epoch 16 | time: 3777.76s | valid loss 1.0463 | valid ppl 2.8471 | learning rate 5.0000
| end of split 58 / 28 | epoch 16 | time: 3782.11s | valid loss 1.0463 | valid ppl 2.8471 | learning rate 5.0000
| end of split 59 / 28 | epoch 16 | time: 3788.61s | valid loss 1.0462 | valid ppl 2.8468 | learning rate 5.0000
| end of split 60 / 28 | epoch 16 | time: 3790.19s | valid loss 1.0462 | valid ppl 2.8468 | learning rate 5.0000
| end of split 61 / 28 | epoch 16 | time: 3777.68s | valid loss 1.0461 | valid ppl 2.8465 | learning rate 5.0000
| end of split 62 / 28 | epoch 16 | time: 3784.12s | valid loss 1.0461 | valid ppl 2.8464 | learning rate 5.0000
| end of split 63 / 28 | epoch 16 | time: 3783.63s | valid loss 1.0462 | valid ppl 2.8468 | learning rate 5.0000
| end of split 64 / 28 | epoch 16 | time: 3783.34s | valid loss 1.0460 | valid ppl 2.8464 | learning rate 5.0000
| end of split 65 / 28 | epoch 16 | time: 3784.09s | valid loss 1.0460 | valid ppl 2.8464 | learning rate 5.0000
| end of split 66 / 28 | epoch 16 | time: 3777.62s | valid loss 1.0462 | valid ppl 2.8468 | learning rate 5.0000
| end of split 67 / 28 | epoch 16 | time: 3784.19s | valid loss 1.0463 | valid ppl 2.8471 | learning rate 5.0000
| end of split 68 / 28 | epoch 16 | time: 3787.31s | valid loss 1.0461 | valid ppl 2.8466 | learning rate 5.0000
| end of split 69 / 28 | epoch 16 | time: 3788.53s | valid loss 1.0460 | valid ppl 2.8463 | learning rate 5.0000
| end of split 70 / 28 | epoch 16 | time: 3768.45s | valid loss 1.0460 | valid ppl 2.8463 | learning rate 5.0000
| end of split 71 / 28 | epoch 16 | time: 3772.25s | valid loss 1.0460 | valid ppl 2.8462 | learning rate 5.0000
| end of split 72 / 28 | epoch 16 | time: 3773.52s | valid loss 1.0461 | valid ppl 2.8465 | learning rate 5.0000
| end of split 73 / 28 | epoch 16 | time: 3775.06s | valid loss 1.0459 | valid ppl 2.8460 | learning rate 5.0000
| end of split 74 / 28 | epoch 16 | time: 1088.94s | valid loss 1.0460 | valid ppl 2.8462 | learning rate 5.0000
| end of split 75 / 28 | epoch 16 | time: 3776.94s | valid loss 1.0460 | valid ppl 2.8461 | learning rate 5.0000
| end of split 76 / 28 | epoch 16 | time: 3776.49s | valid loss 1.0458 | valid ppl 2.8458 | learning rate 5.0000
| end of split 77 / 28 | epoch 16 | time: 3760.70s | valid loss 1.0460 | valid ppl 2.8462 | learning rate 5.0000
| end of split 78 / 28 | epoch 16 | time: 3764.39s | valid loss 1.0458 | valid ppl 2.8458 | learning rate 5.0000
| end of split 79 / 28 | epoch 16 | time: 3767.48s | valid loss 1.0459 | valid ppl 2.8459 | learning rate 5.0000
| end of split 80 / 28 | epoch 16 | time: 3764.28s | valid loss 1.0458 | valid ppl 2.8457 | learning rate 5.0000
| end of split 81 / 28 | epoch 16 | time: 3763.77s | valid loss 1.0459 | valid ppl 2.8459 | learning rate 5.0000
| end of split 82 / 28 | epoch 16 | time: 3765.33s | valid loss 1.0458 | valid ppl 2.8457 | learning rate 5.0000
| end of split 83 / 28 | epoch 16 | time: 3765.05s | valid loss 1.0459 | valid ppl 2.8460 | learning rate 5.0000
| end of split 84 / 28 | epoch 16 | time: 3768.31s | valid loss 1.0460 | valid ppl 2.8462 | learning rate 5.0000
| end of split 85 / 28 | epoch 16 | time: 8461.32s | valid loss 1.0457 | valid ppl 2.8453 | learning rate 5.0000
| end of split 86 / 28 | epoch 16 | time: 3758.07s | valid loss 1.0457 | valid ppl 2.8455 | learning rate 5.0000
| end of split 87 / 28 | epoch 16 | time: 3768.50s | valid loss 1.0456 | valid ppl 2.8451 | learning rate 5.0000
| end of split 88 / 28 | epoch 16 | time: 3769.08s | valid loss 1.0455 | valid ppl 2.8448 | learning rate 5.0000
| end of split 89 / 28 | epoch 16 | time: 3770.56s | valid loss 1.0455 | valid ppl 2.8447 | learning rate 5.0000
| end of split 90 / 28 | epoch 16 | time: 14005.30s | valid loss 1.0457 | valid ppl 2.8454 | learning rate 5.0000
| end of split 91 / 28 | epoch 16 | time: 3769.87s | valid loss 1.0456 | valid ppl 2.8452 | learning rate 5.0000
| end of split 92 / 28 | epoch 16 | time: 3767.83s | valid loss 1.0456 | valid ppl 2.8450 | learning rate 5.0000
| end of split 93 / 28 | epoch 16 | time: 3773.59s | valid loss 1.0455 | valid ppl 2.8447 | learning rate 5.0000
| end of split 94 / 28 | epoch 16 | time: 3773.86s | valid loss 1.0454 | valid ppl 2.8445 | learning rate 5.0000
| end of split 95 / 28 | epoch 16 | time: 3775.08s | valid loss 1.0454 | valid ppl 2.8446 | learning rate 5.0000
| end of split 96 / 28 | epoch 16 | time: 3768.97s | valid loss 1.0454 | valid ppl 2.8446 | learning rate 5.0000
| end of split 97 / 28 | epoch 16 | time: 3768.56s | valid loss 1.0454 | valid ppl 2.8444 | learning rate 5.0000
| end of split 70 / 28 | epoch 17 | time: 3766.49s | valid loss 1.0454 | valid ppl 2.8446 | learning rate 5.0000
| end of split 71 / 28 | epoch 17 | time: 3768.43s | valid loss 1.0453 | valid ppl 2.8442 | learning rate 5.0000
| end of split 72 / 28 | epoch 17 | time: 3769.70s | valid loss 1.0452 | valid ppl 2.8440 | learning rate 5.0000
| end of split 73 / 28 | epoch 17 | time: 3767.36s | valid loss 1.0452 | valid ppl 2.8441 | learning rate 5.0000
| end of split 74 / 28 | epoch 17 | time: 3766.45s | valid loss 1.0453 | valid ppl 2.8443 | learning rate 5.0000
| end of split 75 / 28 | epoch 17 | time: 3770.24s | valid loss 1.0452 | valid ppl 2.8438 | learning rate 5.0000
| end of split 76 / 28 | epoch 17 | time: 3769.41s | valid loss 1.0452 | valid ppl 2.8440 | learning rate 5.0000
| end of split 77 / 28 | epoch 17 | time: 3768.07s | valid loss 1.0453 | valid ppl 2.8442 | learning rate 5.0000
| end of split 78 / 28 | epoch 17 | time: 3766.95s | valid loss 1.0452 | valid ppl 2.8439 | learning rate 5.0000
| end of split 79 / 28 | epoch 17 | time: 3771.74s | valid loss 1.0451 | valid ppl 2.8438 | learning rate 5.0000
| end of split 80 / 28 | epoch 17 | time: 3767.33s | valid loss 1.0451 | valid ppl 2.8438 | learning rate 5.0000
| end of split 81 / 28 | epoch 17 | time: 1090.84s | valid loss 1.0451 | valid ppl 2.8437 | learning rate 5.0000
| end of split 82 / 28 | epoch 17 | time: 3767.97s | valid loss 1.0452 | valid ppl 2.8439 | learning rate 5.0000
| end of split 83 / 28 | epoch 17 | time: 3769.03s | valid loss 1.0451 | valid ppl 2.8436 | learning rate 5.0000
| end of split 84 / 28 | epoch 17 | time: 3775.66s | valid loss 1.0449 | valid ppl 2.8432 | learning rate 5.0000
| end of split 85 / 28 | epoch 17 | time: 3779.76s | valid loss 1.0452 | valid ppl 2.8439 | learning rate 5.0000
| end of split 86 / 28 | epoch 17 | time: 3784.33s | valid loss 1.0450 | valid ppl 2.8434 | learning rate 5.0000
| end of split 87 / 28 | epoch 17 | time: 3776.78s | valid loss 1.0448 | valid ppl 2.8429 | learning rate 5.0000
| end of split 88 / 28 | epoch 17 | time: 3774.18s | valid loss 1.0449 | valid ppl 2.8432 | learning rate 5.0000
| end of split 89 / 28 | epoch 17 | time: 3780.26s | valid loss 1.0448 | valid ppl 2.8428 | learning rate 5.0000
| end of split 90 / 28 | epoch 17 | time: 3785.91s | valid loss 1.0451 | valid ppl 2.8438 | learning rate 5.0000
| end of split 91 / 28 | epoch 17 | time: 3777.93s | valid loss 1.0450 | valid ppl 2.8433 | learning rate 5.0000
| end of split 92 / 28 | epoch 17 | time: 3783.56s | valid loss 1.0448 | valid ppl 2.8427 | learning rate 5.0000
| end of split 93 / 28 | epoch 17 | time: 3786.24s | valid loss 1.0448 | valid ppl 2.8428 | learning rate 5.0000
| end of split 94 / 28 | epoch 17 | time: 3787.14s | valid loss 1.0448 | valid ppl 2.8428 | learning rate 5.0000
| end of split 95 / 28 | epoch 17 | time: 3778.01s | valid loss 1.0448 | valid ppl 2.8429 | learning rate 5.0000
| end of split 96 / 28 | epoch 17 | time: 1091.34s | valid loss 1.0447 | valid ppl 2.8426 | learning rate 5.0000
| end of split 97 / 28 | epoch 17 | time: 3788.24s | valid loss 1.0448 | valid ppl 2.8430 | learning rate 5.0000
| end of split 98 / 28 | epoch 17 | time: 3783.64s | valid loss 1.0447 | valid ppl 2.8426 | learning rate 5.0000
| end of split 99 / 28 | epoch 17 | time: 3778.02s | valid loss 1.0447 | valid ppl 2.8425 | learning rate 5.0000
| end of split 100 / 28 | epoch 17 | time: 3780.18s | valid loss 1.0446 | valid ppl 2.8424 | learning rate 5.0000
| end of split 101 / 28 | epoch 17 | time: 3787.06s | valid loss 1.0446 | valid ppl 2.8423 | learning rate 5.0000
| end of split 102 / 28 | epoch 17 | time: 3779.38s | valid loss 1.0447 | valid ppl 2.8427 | learning rate 5.0000
| end of split 103 / 28 | epoch 17 | time: 3779.40s | valid loss 1.0445 | valid ppl 2.8421 | learning rate 5.0000
| end of split 104 / 28 | epoch 17 | time: 3782.58s | valid loss 1.0446 | valid ppl 2.8423 | learning rate 5.0000
| end of split 105 / 28 | epoch 17 | time: 3785.37s | valid loss 1.0446 | valid ppl 2.8423 | learning rate 5.0000
| end of split 106 / 28 | epoch 17 | time: 3783.39s | valid loss 1.0445 | valid ppl 2.8420 | learning rate 5.0000
| end of split 107 / 28 | epoch 17 | time: 3788.59s | valid loss 1.0445 | valid ppl 2.8420 | learning rate 5.0000
| end of split 108 / 28 | epoch 17 | time: 3791.43s | valid loss 1.0445 | valid ppl 2.8420 | learning rate 5.0000
| end of split 109 / 28 | epoch 17 | time: 3783.75s | valid loss 1.0444 | valid ppl 2.8417 | learning rate 5.0000
| end of split 110 / 28 | epoch 17 | time: 3783.79s | valid loss 1.0444 | valid ppl 2.8417 | learning rate 5.0000
| end of split 111 / 28 | epoch 17 | time: 3792.08s | valid loss 1.0444 | valid ppl 2.8416 | learning rate 5.0000
| end of split 84 / 28 | epoch 18 | time: 3782.22s | valid loss 1.0444 | valid ppl 2.8417 | learning rate 5.0000
| end of split 85 / 28 | epoch 18 | time: 3790.36s | valid loss 1.0442 | valid ppl 2.8413 | learning rate 5.0000
| end of split 86 / 28 | epoch 18 | time: 3787.88s | valid loss 1.0443 | valid ppl 2.8415 | learning rate 5.0000
| end of split 87 / 28 | epoch 18 | time: 3788.66s | valid loss 1.0442 | valid ppl 2.8412 | learning rate 5.0000
| end of split 88 / 28 | epoch 18 | time: 3789.25s | valid loss 1.0446 | valid ppl 2.8422 | learning rate 5.0000
| end of split 89 / 28 | epoch 18 | time: 3787.81s | valid loss 1.0443 | valid ppl 2.8415 | learning rate 5.0000
| end of split 90 / 28 | epoch 18 | time: 3788.52s | valid loss 1.0443 | valid ppl 2.8414 | learning rate 5.0000
| end of split 91 / 28 | epoch 18 | time: 3778.27s | valid loss 1.0443 | valid ppl 2.8414 | learning rate 5.0000
| end of split 92 / 28 | epoch 18 | time: 3778.64s | valid loss 1.0444 | valid ppl 2.8416 | learning rate 5.0000
| end of split 93 / 28 | epoch 18 | time: 3783.70s | valid loss 1.0443 | valid ppl 2.8414 | learning rate 5.0000
| end of split 94 / 28 | epoch 18 | time: 3787.30s | valid loss 1.0441 | valid ppl 2.8408 | learning rate 5.0000
| end of split 95 / 28 | epoch 18 | time: 3787.36s | valid loss 1.0442 | valid ppl 2.8411 | learning rate 5.0000
| end of split 96 / 28 | epoch 18 | time: 3784.83s | valid loss 1.0441 | valid ppl 2.8409 | learning rate 5.0000
| end of split 97 / 28 | epoch 18 | time: 3788.12s | valid loss 1.0442 | valid ppl 2.8410 | learning rate 5.0000
| end of split 98 / 28 | epoch 18 | time: 3783.89s | valid loss 1.0441 | valid ppl 2.8408 | learning rate 5.0000
| end of split 99 / 28 | epoch 18 | time: 3786.66s | valid loss 1.0447 | valid ppl 2.8426 | learning rate 5.0000
| end of split 100 / 28 | epoch 18 | time: 3784.01s | valid loss 1.0439 | valid ppl 2.8404 | learning rate 5.0000
| end of split 101 / 28 | epoch 18 | time: 3785.51s | valid loss 1.0440 | valid ppl 2.8406 | learning rate 5.0000
| end of split 102 / 28 | epoch 18 | time: 3786.84s | valid loss 1.0440 | valid ppl 2.8406 | learning rate 5.0000
| end of split 103 / 28 | epoch 18 | time: 3788.65s | valid loss 1.0440 | valid ppl 2.8406 | learning rate 5.0000
| end of split 104 / 28 | epoch 18 | time: 3781.44s | valid loss 1.0439 | valid ppl 2.8403 | learning rate 5.0000
| end of split 105 / 28 | epoch 18 | time: 3782.71s | valid loss 1.0439 | valid ppl 2.8402 | learning rate 5.0000
| end of split 106 / 28 | epoch 18 | time: 3785.08s | valid loss 1.0439 | valid ppl 2.8404 | learning rate 5.0000
| end of split 107 / 28 | epoch 18 | time: 3787.80s | valid loss 1.0439 | valid ppl 2.8402 | learning rate 5.0000
| end of split 108 / 28 | epoch 18 | time: 3787.92s | valid loss 1.0439 | valid ppl 2.8404 | learning rate 5.0000
| end of split 109 / 28 | epoch 18 | time: 3789.70s | valid loss 1.0439 | valid ppl 2.8402 | learning rate 5.0000
| end of split 110 / 28 | epoch 18 | time: 3789.33s | valid loss 1.0438 | valid ppl 2.8399 | learning rate 5.0000
| end of split 111 / 28 | epoch 18 | time: 3791.71s | valid loss 1.0438 | valid ppl 2.8401 | learning rate 5.0000
| end of split 112 / 28 | epoch 18 | time: 3788.65s | valid loss 1.0442 | valid ppl 2.8412 | learning rate 5.0000
| end of split 113 / 28 | epoch 18 | time: 3789.44s | valid loss 1.0440 | valid ppl 2.8405 | learning rate 5.0000
| end of split 114 / 28 | epoch 18 | time: 3794.41s | valid loss 1.0442 | valid ppl 2.8412 | learning rate 5.0000
| end of split 115 / 28 | epoch 18 | time: 3795.64s | valid loss 1.0439 | valid ppl 2.8403 | learning rate 5.0000
| end of split 116 / 28 | epoch 18 | time: 3796.36s | valid loss 1.0437 | valid ppl 2.8398 | learning rate 5.0000
| end of split 117 / 28 | epoch 18 | time: 3797.35s | valid loss 1.0437 | valid ppl 2.8396 | learning rate 5.0000
| end of split 118 / 28 | epoch 18 | time: 1095.73s | valid loss 1.0438 | valid ppl 2.8399 | learning rate 5.0000
| end of split 91 / 28 | epoch 19 | time: 3794.07s | valid loss 1.0436 | valid ppl 2.8396 | learning rate 5.0000
| end of split 92 / 28 | epoch 19 | time: 3795.57s | valid loss 1.0437 | valid ppl 2.8397 | learning rate 5.0000
| end of split 93 / 28 | epoch 19 | time: 3798.43s | valid loss 1.0438 | valid ppl 2.8400 | learning rate 5.0000
| end of split 94 / 28 | epoch 19 | time: 3796.37s | valid loss 1.0436 | valid ppl 2.8395 | learning rate 5.0000
| end of split 95 / 28 | epoch 19 | time: 3793.75s | valid loss 1.0435 | valid ppl 2.8392 | learning rate 5.0000
| end of split 96 / 28 | epoch 19 | time: 3796.43s | valid loss 1.0436 | valid ppl 2.8393 | learning rate 5.0000
| end of split 97 / 28 | epoch 19 | time: 3795.62s | valid loss 1.0435 | valid ppl 2.8392 | learning rate 5.0000
| end of split 98 / 28 | epoch 19 | time: 3796.90s | valid loss 1.0435 | valid ppl 2.8391 | learning rate 5.0000
| end of split 99 / 28 | epoch 19 | time: 3800.39s | valid loss 1.0434 | valid ppl 2.8389 | learning rate 5.0000
| end of split 100 / 28 | epoch 19 | time: 3797.05s | valid loss 1.0435 | valid ppl 2.8390 | learning rate 5.0000
| end of split 101 / 28 | epoch 19 | time: 3797.44s | valid loss 1.0435 | valid ppl 2.8391 | learning rate 5.0000
TEST: valid loss 1.0437 | valid ppl 2.8396
| end of split 102 / 28 | epoch 19 | time: 3794.36s | valid loss 1.0438 | valid ppl 2.8399 | learning rate 5.0000
| end of split 103 / 28 | epoch 19 | time: 3806.73s | valid loss 1.0435 | valid ppl 2.8393 | learning rate 5.0000
| end of split 104 / 28 | epoch 19 | time: 3802.22s | valid loss 1.0434 | valid ppl 2.8388 | learning rate 5.0000
| end of split 105 / 28 | epoch 19 | time: 3802.72s | valid loss 1.0434 | valid ppl 2.8389 | learning rate 5.0000
| end of split 106 / 28 | epoch 19 | time: 3798.73s | valid loss 1.0435 | valid ppl 2.8390 | learning rate 5.0000
| end of split 107 / 28 | epoch 19 | time: 3801.41s | valid loss 1.0435 | valid ppl 2.8390 | learning rate 5.0000
| end of split 108 / 28 | epoch 19 | time: 3796.79s | valid loss 1.0435 | valid ppl 2.8390 | learning rate 5.0000
| end of split 109 / 28 | epoch 19 | time: 3797.70s | valid loss 1.0434 | valid ppl 2.8388 | learning rate 5.0000
| end of split 110 / 28 | epoch 19 | time: 3798.06s | valid loss 1.0436 | valid ppl 2.8393 | learning rate 5.0000
| end of split 111 / 28 | epoch 19 | time: 3798.93s | valid loss 1.0434 | valid ppl 2.8387 | learning rate 5.0000
| end of split 112 / 28 | epoch 19 | time: 3800.72s | valid loss 1.0435 | valid ppl 2.8392 | learning rate 5.0000
| end of split 113 / 28 | epoch 19 | time: 3803.27s | valid loss 1.0433 | valid ppl 2.8385 | learning rate 5.0000
| end of split 114 / 28 | epoch 19 | time: 3798.19s | valid loss 1.0432 | valid ppl 2.8384 | learning rate 5.0000
| end of split 115 / 28 | epoch 19 | time: 3803.30s | valid loss 1.0434 | valid ppl 2.8388 | learning rate 5.0000
| end of split 116 / 28 | epoch 19 | time: 3799.43s | valid loss 1.0433 | valid ppl 2.8387 | learning rate 5.0000
| end of split 117 / 28 | epoch 19 | time: 1095.91s | valid loss 1.0432 | valid ppl 2.8384 | learning rate 5.0000
| end of split 118 / 28 | epoch 19 | time: 3784.22s | valid loss 1.0430 | valid ppl 2.8377 | learning rate 5.0000
| end of split 119 / 28 | epoch 19 | time: 3789.87s | valid loss 1.0431 | valid ppl 2.8381 | learning rate 5.0000
| end of split 120 / 28 | epoch 19 | time: 3792.03s | valid loss 1.0431 | valid ppl 2.8379 | learning rate 5.0000
| end of split 121 / 28 | epoch 19 | time: 3787.89s | valid loss 1.0431 | valid ppl 2.8380 | learning rate 5.0000
| end of split 122 / 28 | epoch 19 | time: 3789.35s | valid loss 1.0430 | valid ppl 2.8377 | learning rate 5.0000
| end of split 123 / 28 | epoch 19 | time: 3787.37s | valid loss 1.0431 | valid ppl 2.8380 | learning rate 5.0000
| end of split 124 / 28 | epoch 19 | time: 3789.74s | valid loss 1.0431 | valid ppl 2.8381 | learning rate 5.0000
| end of split 125 / 28 | epoch 19 | time: 3790.80s | valid loss 1.0430 | valid ppl 2.8376 | learning rate 5.0000
| end of split 126 / 28 | epoch 19 | time: 3794.24s | valid loss 1.0429 | valid ppl 2.8376 | learning rate 5.0000
| end of split 127 / 28 | epoch 19 | time: 3796.27s | valid loss 1.0430 | valid ppl 2.8376 | learning rate 5.0000
| end of split 128 / 28 | epoch 19 | time: 3794.78s | valid loss 1.0428 | valid ppl 2.8372 | learning rate 5.0000
| end of split 129 / 28 | epoch 19 | time: 3796.47s | valid loss 1.0428 | valid ppl 2.8372 | learning rate 5.0000
| end of split 130 / 28 | epoch 19 | time: 3796.06s | valid loss 1.0430 | valid ppl 2.8378 | learning rate 5.0000
| end of split 131 / 28 | epoch 19 | time: 3792.71s | valid loss 1.0439 | valid ppl 2.8403 | learning rate 5.0000
| end of split 132 / 28 | epoch 19 | time: 3789.55s | valid loss 1.0430 | valid ppl 2.8377 | learning rate 5.0000
| end of split 133 / 28 | epoch 19 | time: 3784.97s | valid loss 1.0429 | valid ppl 2.8375 | learning rate 5.0000
| end of split 134 / 28 | epoch 19 | time: 19320.95s | valid loss 1.0428 | valid ppl 2.8372 | learning rate 5.0000
| end of split 135 / 28 | epoch 19 | time: 3771.30s | valid loss 1.0430 | valid ppl 2.8377 | learning rate 5.0000
| end of split 136 / 28 | epoch 19 | time: 3775.60s | valid loss 1.0428 | valid ppl 2.8372 | learning rate 5.0000
| end of split 137 / 28 | epoch 19 | time: 3774.11s | valid loss 1.0428 | valid ppl 2.8371 | learning rate 5.0000
| end of split 138 / 28 | epoch 19 | time: 3776.83s | valid loss 1.0427 | valid ppl 2.8369 | learning rate 5.0000
| end of split 139 / 28 | epoch 19 | time: 3775.57s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 5.0000
| end of split 140 / 28 | epoch 19 | time: 3774.84s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 5.0000
| end of split 141 / 28 | epoch 19 | time: 1089.42s | valid loss 1.0429 | valid ppl 2.8373 | learning rate 5.0000
| end of split 142 / 28 | epoch 19 | time: 3781.20s | valid loss 1.0427 | valid ppl 2.8369 | learning rate 5.0000
| end of split 143 / 28 | epoch 19 | time: 3774.42s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 5.0000
| end of split 144 / 28 | epoch 19 | time: 3771.17s | valid loss 1.0426 | valid ppl 2.8366 | learning rate 5.0000
| end of split 145 / 28 | epoch 19 | time: 3774.23s | valid loss 1.0425 | valid ppl 2.8364 | learning rate 5.0000
| end of split 118 / 28 | epoch 20 | time: 3768.43s | valid loss 1.0426 | valid ppl 2.8366 | learning rate 5.0000
| end of split 119 / 28 | epoch 20 | time: 3769.09s | valid loss 1.0425 | valid ppl 2.8362 | learning rate 5.0000
| end of split 120 / 28 | epoch 20 | time: 3773.51s | valid loss 1.0425 | valid ppl 2.8363 | learning rate 5.0000
| end of split 121 / 28 | epoch 20 | time: 3773.63s | valid loss 1.0426 | valid ppl 2.8365 | learning rate 5.0000
| end of split 122 / 28 | epoch 20 | time: 3772.94s | valid loss 1.0427 | valid ppl 2.8369 | learning rate 5.0000
| end of split 123 / 28 | epoch 20 | time: 3772.27s | valid loss 1.0426 | valid ppl 2.8365 | learning rate 5.0000
| end of split 124 / 28 | epoch 20 | time: 3772.87s | valid loss 1.0424 | valid ppl 2.8359 | learning rate 5.0000
| end of split 125 / 28 | epoch 20 | time: 3772.19s | valid loss 1.0424 | valid ppl 2.8361 | learning rate 5.0000
| end of split 126 / 28 | epoch 20 | time: 1090.24s | valid loss 1.0424 | valid ppl 2.8360 | learning rate 5.0000
| end of split 127 / 28 | epoch 20 | time: 3775.47s | valid loss 1.0424 | valid ppl 2.8360 | learning rate 5.0000
| end of split 128 / 28 | epoch 20 | time: 3773.74s | valid loss 1.0427 | valid ppl 2.8367 | learning rate 5.0000
| end of split 129 / 28 | epoch 20 | time: 3783.61s | valid loss 1.0425 | valid ppl 2.8363 | learning rate 5.0000
| end of split 130 / 28 | epoch 20 | time: 3782.31s | valid loss 1.0424 | valid ppl 2.8359 | learning rate 5.0000
| end of split 131 / 28 | epoch 20 | time: 3783.41s | valid loss 1.0424 | valid ppl 2.8361 | learning rate 5.0000
| end of split 132 / 28 | epoch 20 | time: 3779.19s | valid loss 1.0424 | valid ppl 2.8359 | learning rate 5.0000
| end of split 133 / 28 | epoch 20 | time: 3779.36s | valid loss 1.0421 | valid ppl 2.8352 | learning rate 5.0000
| end of split 134 / 28 | epoch 20 | time: 3781.11s | valid loss 1.0422 | valid ppl 2.8355 | learning rate 5.0000
| end of split 135 / 28 | epoch 20 | time: 3779.04s | valid loss 1.0423 | valid ppl 2.8357 | learning rate 5.0000
| end of split 136 / 28 | epoch 20 | time: 3777.40s | valid loss 1.0422 | valid ppl 2.8355 | learning rate 5.0000
| end of split 137 / 28 | epoch 20 | time: 3783.11s | valid loss 1.0422 | valid ppl 2.8354 | learning rate 5.0000
| end of split 138 / 28 | epoch 20 | time: 3781.75s | valid loss 1.0422 | valid ppl 2.8354 | learning rate 5.0000
| end of split 139 / 28 | epoch 20 | time: 3785.32s | valid loss 1.0423 | valid ppl 2.8356 | learning rate 5.0000
| end of split 140 / 28 | epoch 20 | time: 3785.28s | valid loss 1.0421 | valid ppl 2.8353 | learning rate 5.0000
| end of split 141 / 28 | epoch 20 | time: 3786.08s | valid loss 1.0423 | valid ppl 2.8357 | learning rate 5.0000
| end of split 142 / 28 | epoch 20 | time: 3782.96s | valid loss 1.0421 | valid ppl 2.8353 | learning rate 5.0000
| end of split 143 / 28 | epoch 20 | time: 3786.43s | valid loss 1.0421 | valid ppl 2.8351 | learning rate 5.0000
| end of split 144 / 28 | epoch 20 | time: 3786.33s | valid loss 1.0421 | valid ppl 2.8351 | learning rate 5.0000
| end of split 145 / 28 | epoch 20 | time: 3786.54s | valid loss 1.0417 | valid ppl 2.8339 | learning rate 1.2500
TEST: valid loss 1.0419 | valid ppl 2.8346
| end of split 146 / 28 | epoch 20 | time: 3774.24s | valid loss 1.0416 | valid ppl 2.8338 | learning rate 1.2500
| end of split 147 / 28 | epoch 20 | time: 3776.03s | valid loss 1.0418 | valid ppl 2.8343 | learning rate 1.2500
| end of split 148 / 28 | epoch 20 | time: 3773.23s | valid loss 1.0416 | valid ppl 2.8339 | learning rate 1.2500
| end of split 149 / 28 | epoch 20 | time: 3776.77s | valid loss 1.0416 | valid ppl 2.8339 | learning rate 1.2500
| end of split 150 / 28 | epoch 20 | time: 3774.30s | valid loss 1.0416 | valid ppl 2.8337 | learning rate 1.2500
| end of split 151 / 28 | epoch 20 | time: 3773.24s | valid loss 1.0416 | valid ppl 2.8337 | learning rate 1.2500
| end of split 152 / 28 | epoch 20 | time: 3777.30s | valid loss 1.0415 | valid ppl 2.8336 | learning rate 1.2500
| end of split 154 / 28 | epoch 20 | time: 3785.00s | valid loss 1.0415 | valid ppl 2.8336 | learning rate 1.2500
| end of split 155 / 28 | epoch 20 | time: 3793.24s | valid loss 1.0415 | valid ppl 2.8334 | learning rate 1.2500
| end of split 156 / 28 | epoch 20 | time: 3797.87s | valid loss 1.0415 | valid ppl 2.8334 | learning rate 1.2500
| end of split 157 / 28 | epoch 20 | time: 3796.65s | valid loss 1.0415 | valid ppl 2.8333 | learning rate 1.2500
| end of split 158 / 28 | epoch 20 | time: 3797.26s | valid loss 1.0415 | valid ppl 2.8334 | learning rate 1.2500
| end of split 159 / 28 | epoch 20 | time: 3795.40s | valid loss 1.0415 | valid ppl 2.8334 | learning rate 1.2500
| end of split 160 / 28 | epoch 20 | time: 3793.63s | valid loss 1.0415 | valid ppl 2.8334 | learning rate 1.2500
| end of split 161 / 28 | epoch 20 | time: 3796.31s | valid loss 1.0414 | valid ppl 2.8332 | learning rate 1.2500
| end of split 162 / 28 | epoch 20 | time: 3798.67s | valid loss 1.0413 | valid ppl 2.8330 | learning rate 1.2500
| end of split 163 / 28 | epoch 20 | time: 3796.81s | valid loss 1.0414 | valid ppl 2.8332 | learning rate 1.2500
| end of split 164 / 28 | epoch 20 | time: 3785.17s | valid loss 1.0414 | valid ppl 2.8331 | learning rate 1.2500
| end of split 165 / 28 | epoch 20 | time: 3780.02s | valid loss 1.0414 | valid ppl 2.8330 | learning rate 1.2500
| end of split 166 / 28 | epoch 20 | time: 3786.33s | valid loss 1.0414 | valid ppl 2.8331 | learning rate 1.2500
| end of split 167 / 28 | epoch 20 | time: 3785.67s | valid loss 1.0413 | valid ppl 2.8330 | learning rate 1.2500
| end of split 168 / 28 | epoch 20 | time: 3787.96s | valid loss 1.0414 | valid ppl 2.8331 | learning rate 1.2500
| end of split 169 / 28 | epoch 20 | time: 3786.45s | valid loss 1.0414 | valid ppl 2.8331 | learning rate 1.2500
| end of split 170 / 28 | epoch 20 | time: 3786.48s | valid loss 1.0414 | valid ppl 2.8330 | learning rate 1.2500
| end of split 171 / 28 | epoch 20 | time: 3785.95s | valid loss 1.0414 | valid ppl 2.8331 | learning rate 1.2500
| end of split 172 / 28 | epoch 20 | time: 3787.46s | valid loss 1.0413 | valid ppl 2.8330 | learning rate 1.2500
| end of split 173 / 28 | epoch 20 | time: 1094.62s | valid loss 1.0413 | valid ppl 2.8329 | learning rate 1.2500
| end of split 174 / 28 | epoch 20 | time: 3790.15s | valid loss 1.0413 | valid ppl 2.8330 | learning rate 1.2500
| end of split 175 / 28 | epoch 20 | time: 3784.64s | valid loss 1.0413 | valid ppl 2.8330 | learning rate 1.2500
| end of split 176 / 28 | epoch 20 | time: 3787.92s | valid loss 1.0414 | valid ppl 2.8331 | learning rate 1.2500
| end of split 177 / 28 | epoch 20 | time: 3786.48s | valid loss 1.0413 | valid ppl 2.8330 | learning rate 1.2500
| end of split 178 / 28 | epoch 20 | time: 3789.78s | valid loss 1.0413 | valid ppl 2.8328 | learning rate 0.3125
| end of split 179 / 28 | epoch 20 | time: 3785.01s | valid loss 1.0412 | valid ppl 2.8327 | learning rate 0.3125
| end of split 180 / 28 | epoch 20 | time: 3787.49s | valid loss 1.0412 | valid ppl 2.8327 | learning rate 0.3125
| end of split 181 / 28 | epoch 20 | time: 3787.49s | valid loss 1.0412 | valid ppl 2.8326 | learning rate 0.3125
| end of split 182 / 28 | epoch 20 | time: 3790.98s | valid loss 1.0412 | valid ppl 2.8326 | learning rate 0.3125
| end of split 183 / 28 | epoch 20 | time: 3786.09s | valid loss 1.0412 | valid ppl 2.8326 | learning rate 0.3125
| end of split 184 / 28 | epoch 20 | time: 3789.24s | valid loss 1.0412 | valid ppl 2.8326 | learning rate 0.3125
| end of split 185 / 28 | epoch 20 | time: 3790.17s | valid loss 1.0412 | valid ppl 2.8325 | learning rate 0.3125
| end of split 186 / 28 | epoch 20 | time: 3786.89s | valid loss 1.0412 | valid ppl 2.8326 | learning rate 0.3125
| end of split 187 / 28 | epoch 20 | time: 3787.30s | valid loss 1.0412 | valid ppl 2.8325 | learning rate 0.3125
| end of split 188 / 28 | epoch 20 | time: 3784.79s | valid loss 1.0412 | valid ppl 2.8325 | learning rate 0.3125
| end of split 189 / 28 | epoch 20 | time: 3783.56s | valid loss 1.0412 | valid ppl 2.8325 | learning rate 0.3125
| end of split 190 / 28 | epoch 20 | time: 3789.72s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0781
| end of split 191 / 28 | epoch 20 | time: 3800.19s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 192 / 28 | epoch 20 | time: 3802.26s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 193 / 28 | epoch 20 | time: 3799.08s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 194 / 28 | epoch 20 | time: 3799.75s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 195 / 28 | epoch 20 | time: 3797.94s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0781
| end of split 196 / 28 | epoch 20 | time: 3803.31s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 197 / 28 | epoch 20 | time: 3802.01s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 198 / 28 | epoch 20 | time: 3801.69s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 199 / 28 | epoch 20 | time: 3800.12s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 200 / 28 | epoch 20 | time: 3790.88s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 201 / 28 | epoch 20 | time: 3786.15s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0781
| end of split 202 / 28 | epoch 20 | time: 3786.84s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0195
| end of split 203 / 28 | epoch 20 | time: 3774.92s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 204 / 28 | epoch 20 | time: 3774.54s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 205 / 28 | epoch 20 | time: 3773.10s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 206 / 28 | epoch 20 | time: 3776.61s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0195
| end of split 207 / 28 | epoch 20 | time: 3777.30s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 208 / 28 | epoch 20 | time: 5363.05s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 209 / 28 | epoch 20 | time: 3770.27s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 210 / 28 | epoch 20 | time: 3776.92s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 211 / 28 | epoch 20 | time: 3775.37s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 212 / 28 | epoch 20 | time: 3777.34s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0195
| end of split 213 / 28 | epoch 20 | time: 3776.31s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0195
| end of split 214 / 28 | epoch 20 | time: 3777.03s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0195
| end of split 215 / 28 | epoch 20 | time: 3775.57s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0049
| end of split 216 / 28 | epoch 20 | time: 3776.52s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0049
| end of split 217 / 28 | epoch 20 | time: 3778.71s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0049
| end of split 218 / 28 | epoch 20 | time: 3776.20s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0049
| end of split 219 / 28 | epoch 20 | time: 1090.35s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0049
| end of split 220 / 28 | epoch 20 | time: 3779.38s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0049
| end of split 221 / 28 | epoch 20 | time: 3778.80s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0049
| end of split 222 / 28 | epoch 20 | time: 3774.20s | valid loss 1.0411 | valid ppl 2.8325 | learning rate 0.0049
| end of split 223 / 28 | epoch 20 | time: 3776.67s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0049
| end of split 224 / 28 | epoch 20 | time: 3777.20s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0049
| end of split 225 / 28 | epoch 20 | time: 3777.42s | valid loss 1.0411 | valid ppl 2.8324 | learning rate 0.0049