Dmitry Chaplinsky
Adding everything
1e0215d
| end of split 1 / 62 | epoch 1 | time: 1603.89s | valid loss 1.4399 | valid ppl 4.2204 | learning rate 20.0000
| end of split 2 / 62 | epoch 1 | time: 1607.81s | valid loss 1.2745 | valid ppl 3.5770 | learning rate 20.0000
| end of split 3 / 62 | epoch 1 | time: 1606.22s | valid loss 1.2037 | valid ppl 3.3323 | learning rate 20.0000
| end of split 4 / 62 | epoch 1 | time: 1606.92s | valid loss 1.1638 | valid ppl 3.2020 | learning rate 20.0000
| end of split 5 / 62 | epoch 1 | time: 1607.10s | valid loss 1.1394 | valid ppl 3.1250 | learning rate 20.0000
| end of split 6 / 62 | epoch 1 | time: 1607.63s | valid loss 1.1180 | valid ppl 3.0588 | learning rate 20.0000
| end of split 7 / 62 | epoch 1 | time: 1608.12s | valid loss 1.1052 | valid ppl 3.0200 | learning rate 20.0000
| end of split 8 / 62 | epoch 1 | time: 1608.18s | valid loss 1.0969 | valid ppl 2.9948 | learning rate 20.0000
| end of split 9 / 62 | epoch 1 | time: 1592.98s | valid loss 1.0812 | valid ppl 2.9482 | learning rate 20.0000
| end of split 10 / 62 | epoch 1 | time: 1597.67s | valid loss 1.0791 | valid ppl 2.9420 | learning rate 20.0000
| end of split 11 / 62 | epoch 1 | time: 1598.41s | valid loss 1.0690 | valid ppl 2.9124 | learning rate 20.0000
| end of split 12 / 62 | epoch 1 | time: 1594.52s | valid loss 1.0625 | valid ppl 2.8937 | learning rate 20.0000
| end of split 13 / 62 | epoch 1 | time: 1595.52s | valid loss 1.0584 | valid ppl 2.8816 | learning rate 20.0000
| end of split 14 / 62 | epoch 1 | time: 1593.63s | valid loss 1.0520 | valid ppl 2.8634 | learning rate 20.0000
| end of split 15 / 62 | epoch 1 | time: 1593.45s | valid loss 1.1233 | valid ppl 3.0750 | learning rate 20.0000
| end of split 16 / 62 | epoch 1 | time: 1594.20s | valid loss 1.0477 | valid ppl 2.8511 | learning rate 20.0000
| end of split 17 / 62 | epoch 1 | time: 1594.12s | valid loss 1.0393 | valid ppl 2.8274 | learning rate 20.0000
| end of split 18 / 62 | epoch 1 | time: 1592.60s | valid loss 1.0382 | valid ppl 2.8242 | learning rate 20.0000
| end of split 19 / 62 | epoch 1 | time: 1591.84s | valid loss 1.0321 | valid ppl 2.8071 | learning rate 20.0000
| end of split 20 / 62 | epoch 1 | time: 1591.25s | valid loss 1.0335 | valid ppl 2.8109 | learning rate 20.0000
| end of split 21 / 62 | epoch 1 | time: 1593.49s | valid loss 1.0276 | valid ppl 2.7944 | learning rate 20.0000
| end of split 22 / 62 | epoch 1 | time: 1590.55s | valid loss 1.0265 | valid ppl 2.7913 | learning rate 20.0000
| end of split 23 / 62 | epoch 1 | time: 1591.47s | valid loss 1.0218 | valid ppl 2.7781 | learning rate 20.0000
| end of split 24 / 62 | epoch 1 | time: 1589.39s | valid loss 1.0218 | valid ppl 2.7781 | learning rate 20.0000
| end of split 25 / 62 | epoch 1 | time: 1591.76s | valid loss 1.0182 | valid ppl 2.7682 | learning rate 20.0000
| end of split 26 / 62 | epoch 1 | time: 1586.71s | valid loss 1.0198 | valid ppl 2.7726 | learning rate 20.0000
| end of split 27 / 62 | epoch 1 | time: 1584.62s | valid loss 1.0144 | valid ppl 2.7578 | learning rate 20.0000
| end of split 28 / 62 | epoch 1 | time: 1586.04s | valid loss 1.0124 | valid ppl 2.7521 | learning rate 20.0000
| end of split 29 / 62 | epoch 1 | time: 1583.84s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 20.0000
| end of split 30 / 62 | epoch 1 | time: 1582.16s | valid loss 1.0126 | valid ppl 2.7527 | learning rate 20.0000
| end of split 31 / 62 | epoch 1 | time: 1582.81s | valid loss 1.0114 | valid ppl 2.7495 | learning rate 20.0000
| end of split 32 / 62 | epoch 1 | time: 1584.10s | valid loss 1.0078 | valid ppl 2.7396 | learning rate 20.0000
| end of split 33 / 62 | epoch 1 | time: 1583.96s | valid loss 1.0067 | valid ppl 2.7367 | learning rate 20.0000
| end of split 34 / 62 | epoch 1 | time: 1584.53s | valid loss 1.0311 | valid ppl 2.8043 | learning rate 20.0000
| end of split 35 / 62 | epoch 1 | time: 1585.34s | valid loss 1.0022 | valid ppl 2.7243 | learning rate 20.0000
| end of split 36 / 62 | epoch 1 | time: 1585.67s | valid loss 1.0017 | valid ppl 2.7229 | learning rate 20.0000
| end of split 37 / 62 | epoch 1 | time: 1583.84s | valid loss 1.0020 | valid ppl 2.7236 | learning rate 20.0000
| end of split 38 / 62 | epoch 1 | time: 1584.28s | valid loss 0.9989 | valid ppl 2.7152 | learning rate 20.0000
| end of split 39 / 62 | epoch 1 | time: 1585.90s | valid loss 1.0254 | valid ppl 2.7882 | learning rate 20.0000
| end of split 40 / 62 | epoch 1 | time: 1588.16s | valid loss 0.9973 | valid ppl 2.7110 | learning rate 20.0000
| end of split 41 / 62 | epoch 1 | time: 1586.15s | valid loss 0.9961 | valid ppl 2.7076 | learning rate 20.0000
| end of split 42 / 62 | epoch 1 | time: 1588.69s | valid loss 0.9963 | valid ppl 2.7083 | learning rate 20.0000
| end of split 43 / 62 | epoch 1 | time: 1588.30s | valid loss 0.9934 | valid ppl 2.7005 | learning rate 20.0000
| end of split 44 / 62 | epoch 1 | time: 1587.86s | valid loss 0.9962 | valid ppl 2.7080 | learning rate 20.0000
| end of split 45 / 62 | epoch 1 | time: 1588.43s | valid loss 0.9921 | valid ppl 2.6970 | learning rate 20.0000
| end of split 46 / 62 | epoch 1 | time: 1591.45s | valid loss 0.9913 | valid ppl 2.6949 | learning rate 20.0000
| end of split 47 / 62 | epoch 1 | time: 1590.01s | valid loss 1.0074 | valid ppl 2.7386 | learning rate 20.0000
| end of split 48 / 62 | epoch 1 | time: 1589.84s | valid loss 0.9891 | valid ppl 2.6889 | learning rate 20.0000
| end of split 49 / 62 | epoch 1 | time: 1591.41s | valid loss 0.9893 | valid ppl 2.6893 | learning rate 20.0000
| end of split 50 / 62 | epoch 1 | time: 1592.88s | valid loss 0.9881 | valid ppl 2.6861 | learning rate 20.0000
| end of split 51 / 62 | epoch 1 | time: 1593.67s | valid loss 0.9872 | valid ppl 2.6836 | learning rate 20.0000
| end of split 52 / 62 | epoch 1 | time: 1593.93s | valid loss 0.9938 | valid ppl 2.7015 | learning rate 20.0000
| end of split 53 / 62 | epoch 1 | time: 1593.15s | valid loss 0.9875 | valid ppl 2.6845 | learning rate 20.0000
| end of split 54 / 62 | epoch 1 | time: 1593.89s | valid loss 0.9844 | valid ppl 2.6763 | learning rate 20.0000
| end of split 55 / 62 | epoch 1 | time: 1594.52s | valid loss 0.9852 | valid ppl 2.6782 | learning rate 20.0000
| end of split 56 / 62 | epoch 1 | time: 1593.26s | valid loss 0.9848 | valid ppl 2.6772 | learning rate 20.0000
| end of split 57 / 62 | epoch 1 | time: 1594.39s | valid loss 0.9827 | valid ppl 2.6717 | learning rate 20.0000
| end of split 58 / 62 | epoch 1 | time: 1593.89s | valid loss 0.9834 | valid ppl 2.6736 | learning rate 20.0000
| end of split 59 / 62 | epoch 1 | time: 1594.99s | valid loss 0.9814 | valid ppl 2.6682 | learning rate 20.0000
| end of split 60 / 62 | epoch 1 | time: 1595.07s | valid loss 0.9885 | valid ppl 2.6871 | learning rate 20.0000
| end of split 61 / 62 | epoch 1 | time: 1593.04s | valid loss 0.9834 | valid ppl 2.6736 | learning rate 20.0000
| end of split 62 / 62 | epoch 1 | time: 850.81s | valid loss 0.9894 | valid ppl 2.6895 | learning rate 20.0000
| end of split 1 / 62 | epoch 2 | time: 1589.43s | valid loss 0.9930 | valid ppl 2.6992 | learning rate 20.0000
| end of split 2 / 62 | epoch 2 | time: 1592.05s | valid loss 0.9823 | valid ppl 2.6706 | learning rate 20.0000
| end of split 3 / 62 | epoch 2 | time: 1591.91s | valid loss 0.9795 | valid ppl 2.6631 | learning rate 20.0000
| end of split 4 / 62 | epoch 2 | time: 1589.81s | valid loss 0.9798 | valid ppl 2.6638 | learning rate 20.0000
| end of split 5 / 62 | epoch 2 | time: 1592.72s | valid loss 0.9863 | valid ppl 2.6812 | learning rate 20.0000
| end of split 6 / 62 | epoch 2 | time: 1591.02s | valid loss 0.9793 | valid ppl 2.6627 | learning rate 20.0000
| end of split 7 / 62 | epoch 2 | time: 1591.96s | valid loss 0.9778 | valid ppl 2.6587 | learning rate 20.0000
| end of split 8 / 62 | epoch 2 | time: 1589.75s | valid loss 0.9770 | valid ppl 2.6565 | learning rate 20.0000
| end of split 9 / 62 | epoch 2 | time: 1589.90s | valid loss 0.9770 | valid ppl 2.6565 | learning rate 20.0000
| end of split 10 / 62 | epoch 2 | time: 1586.76s | valid loss 0.9759 | valid ppl 2.6535 | learning rate 20.0000
| end of split 11 / 62 | epoch 2 | time: 1583.54s | valid loss 0.9783 | valid ppl 2.6600 | learning rate 20.0000
| end of split 12 / 62 | epoch 2 | time: 1585.70s | valid loss 1.0014 | valid ppl 2.7221 | learning rate 20.0000
| end of split 13 / 62 | epoch 2 | time: 1585.88s | valid loss 0.9768 | valid ppl 2.6559 | learning rate 20.0000
| end of split 14 / 62 | epoch 2 | time: 1587.69s | valid loss 0.9754 | valid ppl 2.6523 | learning rate 20.0000
| end of split 15 / 62 | epoch 2 | time: 1586.05s | valid loss 0.9736 | valid ppl 2.6475 | learning rate 20.0000
| end of split 16 / 62 | epoch 2 | time: 1589.38s | valid loss 0.9740 | valid ppl 2.6486 | learning rate 20.0000
| end of split 17 / 62 | epoch 2 | time: 1591.27s | valid loss 0.9756 | valid ppl 2.6527 | learning rate 20.0000
| end of split 18 / 62 | epoch 2 | time: 1590.28s | valid loss 0.9728 | valid ppl 2.6454 | learning rate 20.0000
| end of split 19 / 62 | epoch 2 | time: 1588.81s | valid loss 0.9727 | valid ppl 2.6452 | learning rate 20.0000
| end of split 20 / 62 | epoch 2 | time: 1590.45s | valid loss 0.9723 | valid ppl 2.6440 | learning rate 20.0000
| end of split 21 / 62 | epoch 2 | time: 1587.61s | valid loss 0.9716 | valid ppl 2.6422 | learning rate 20.0000
| end of split 22 / 62 | epoch 2 | time: 1587.52s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
| end of split 23 / 62 | epoch 2 | time: 1587.01s | valid loss 0.9709 | valid ppl 2.6402 | learning rate 20.0000
| end of split 24 / 62 | epoch 2 | time: 1587.21s | valid loss 0.9701 | valid ppl 2.6383 | learning rate 20.0000
| end of split 25 / 62 | epoch 2 | time: 1585.58s | valid loss 0.9713 | valid ppl 2.6413 | learning rate 20.0000
| end of split 26 / 62 | epoch 2 | time: 1582.23s | valid loss 0.9920 | valid ppl 2.6967 | learning rate 20.0000
| end of split 27 / 62 | epoch 2 | time: 1584.31s | valid loss 0.9696 | valid ppl 2.6368 | learning rate 20.0000
| end of split 28 / 62 | epoch 2 | time: 1583.27s | valid loss 0.9690 | valid ppl 2.6353 | learning rate 20.0000
| end of split 29 / 62 | epoch 2 | time: 1583.73s | valid loss 0.9685 | valid ppl 2.6339 | learning rate 20.0000
| end of split 30 / 62 | epoch 2 | time: 1582.01s | valid loss 0.9712 | valid ppl 2.6412 | learning rate 20.0000
| end of split 31 / 62 | epoch 2 | time: 1577.61s | valid loss 0.9698 | valid ppl 2.6374 | learning rate 20.0000
| end of split 32 / 62 | epoch 2 | time: 1576.99s | valid loss 0.9677 | valid ppl 2.6318 | learning rate 20.0000
| end of split 33 / 62 | epoch 2 | time: 1576.05s | valid loss 0.9675 | valid ppl 2.6314 | learning rate 20.0000
| end of split 34 / 62 | epoch 2 | time: 1580.30s | valid loss 0.9668 | valid ppl 2.6296 | learning rate 20.0000
| end of split 35 / 62 | epoch 2 | time: 1580.63s | valid loss 0.9663 | valid ppl 2.6282 | learning rate 20.0000
| end of split 36 / 62 | epoch 2 | time: 1581.22s | valid loss 0.9660 | valid ppl 2.6275 | learning rate 20.0000
| end of split 37 / 62 | epoch 2 | time: 1581.83s | valid loss 0.9668 | valid ppl 2.6295 | learning rate 20.0000
| end of split 38 / 62 | epoch 2 | time: 1583.12s | valid loss 0.9663 | valid ppl 2.6283 | learning rate 20.0000
| end of split 39 / 62 | epoch 2 | time: 1584.87s | valid loss 0.9653 | valid ppl 2.6256 | learning rate 20.0000
| end of split 40 / 62 | epoch 2 | time: 847.08s | valid loss 0.9723 | valid ppl 2.6440 | learning rate 20.0000
| end of split 41 / 62 | epoch 2 | time: 1592.30s | valid loss 0.9707 | valid ppl 2.6398 | learning rate 20.0000
| end of split 42 / 62 | epoch 2 | time: 1602.69s | valid loss 0.9655 | valid ppl 2.6262 | learning rate 20.0000
| end of split 43 / 62 | epoch 2 | time: 1608.11s | valid loss 0.9649 | valid ppl 2.6245 | learning rate 20.0000
| end of split 44 / 62 | epoch 2 | time: 1610.00s | valid loss 0.9641 | valid ppl 2.6225 | learning rate 20.0000
| end of split 45 / 62 | epoch 2 | time: 1590.39s | valid loss 1.0062 | valid ppl 2.7352 | learning rate 20.0000
| end of split 46 / 62 | epoch 2 | time: 1569.29s | valid loss 1.5219 | valid ppl 4.5807 | learning rate 20.0000
| end of split 47 / 62 | epoch 2 | time: 1573.04s | valid loss 1.2816 | valid ppl 3.6023 | learning rate 20.0000
| end of split 48 / 62 | epoch 2 | time: 1575.91s | valid loss 1.1161 | valid ppl 3.0529 | learning rate 20.0000
| end of split 49 / 62 | epoch 2 | time: 1573.44s | valid loss 1.0870 | valid ppl 2.9653 | learning rate 20.0000
| end of split 50 / 62 | epoch 2 | time: 1575.89s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 20.0000
| end of split 51 / 62 | epoch 2 | time: 1578.06s | valid loss 1.0085 | valid ppl 2.7415 | learning rate 20.0000
| end of split 52 / 62 | epoch 2 | time: 1583.24s | valid loss 0.9898 | valid ppl 2.6907 | learning rate 20.0000
| end of split 53 / 62 | epoch 2 | time: 1583.39s | valid loss 0.9789 | valid ppl 2.6617 | learning rate 20.0000
| end of split 54 / 62 | epoch 2 | time: 1582.99s | valid loss 0.9752 | valid ppl 2.6516 | learning rate 20.0000
| end of split 55 / 62 | epoch 2 | time: 1584.67s | valid loss 0.9727 | valid ppl 2.6450 | learning rate 20.0000
| end of split 56 / 62 | epoch 2 | time: 1587.32s | valid loss 0.9680 | valid ppl 2.6327 | learning rate 5.0000
| end of split 57 / 62 | epoch 2 | time: 1589.56s | valid loss 0.9671 | valid ppl 2.6303 | learning rate 5.0000
| end of split 58 / 62 | epoch 2 | time: 1590.23s | valid loss 0.9665 | valid ppl 2.6286 | learning rate 5.0000
| end of split 59 / 62 | epoch 2 | time: 1592.84s | valid loss 0.9658 | valid ppl 2.6270 | learning rate 5.0000
| end of split 60 / 62 | epoch 2 | time: 1593.67s | valid loss 0.9652 | valid ppl 2.6253 | learning rate 5.0000
| end of split 61 / 62 | epoch 2 | time: 1593.45s | valid loss 0.9671 | valid ppl 2.6303 | learning rate 5.0000
| end of split 62 / 62 | epoch 2 | time: 1592.63s | valid loss 0.9642 | valid ppl 2.6228 | learning rate 5.0000
| end of split 1 / 62 | epoch 3 | time: 1588.48s | valid loss 0.9639 | valid ppl 2.6219 | learning rate 5.0000
| end of split 2 / 62 | epoch 3 | time: 1595.00s | valid loss 0.9635 | valid ppl 2.6208 | learning rate 5.0000
| end of split 3 / 62 | epoch 3 | time: 1592.33s | valid loss 0.9631 | valid ppl 2.6197 | learning rate 5.0000
| end of split 4 / 62 | epoch 3 | time: 1592.28s | valid loss 0.9630 | valid ppl 2.6194 | learning rate 5.0000
| end of split 5 / 62 | epoch 3 | time: 1592.85s | valid loss 0.9626 | valid ppl 2.6184 | learning rate 5.0000
| end of split 6 / 62 | epoch 3 | time: 1592.84s | valid loss 0.9622 | valid ppl 2.6173 | learning rate 5.0000
| end of split 7 / 62 | epoch 3 | time: 1592.00s | valid loss 0.9619 | valid ppl 2.6167 | learning rate 5.0000
| end of split 8 / 62 | epoch 3 | time: 1593.04s | valid loss 0.9616 | valid ppl 2.6159 | learning rate 5.0000
| end of split 9 / 62 | epoch 3 | time: 1592.29s | valid loss 0.9615 | valid ppl 2.6155 | learning rate 5.0000
| end of split 10 / 62 | epoch 3 | time: 1590.81s | valid loss 0.9612 | valid ppl 2.6149 | learning rate 5.0000
| end of split 11 / 62 | epoch 3 | time: 1591.61s | valid loss 0.9611 | valid ppl 2.6146 | learning rate 5.0000
| end of split 12 / 62 | epoch 3 | time: 1590.51s | valid loss 0.9609 | valid ppl 2.6141 | learning rate 5.0000
| end of split 13 / 62 | epoch 3 | time: 1590.78s | valid loss 0.9604 | valid ppl 2.6127 | learning rate 5.0000
| end of split 14 / 62 | epoch 3 | time: 1589.97s | valid loss 0.9604 | valid ppl 2.6126 | learning rate 5.0000
| end of split 15 / 62 | epoch 3 | time: 1589.70s | valid loss 0.9600 | valid ppl 2.6117 | learning rate 5.0000
| end of split 16 / 62 | epoch 3 | time: 1589.05s | valid loss 0.9600 | valid ppl 2.6118 | learning rate 5.0000
| end of split 17 / 62 | epoch 3 | time: 1589.99s | valid loss 0.9596 | valid ppl 2.6107 | learning rate 5.0000
| end of split 18 / 62 | epoch 3 | time: 1590.63s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 5.0000
| end of split 19 / 62 | epoch 3 | time: 1588.73s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 5.0000
| end of split 20 / 62 | epoch 3 | time: 1589.71s | valid loss 0.9589 | valid ppl 2.6088 | learning rate 5.0000
| end of split 21 / 62 | epoch 3 | time: 1589.46s | valid loss 0.9588 | valid ppl 2.6086 | learning rate 5.0000
| end of split 22 / 62 | epoch 3 | time: 1589.12s | valid loss 0.9586 | valid ppl 2.6080 | learning rate 5.0000
| end of split 23 / 62 | epoch 3 | time: 1591.71s | valid loss 0.9589 | valid ppl 2.6088 | learning rate 5.0000
| end of split 24 / 62 | epoch 3 | time: 1589.39s | valid loss 0.9582 | valid ppl 2.6070 | learning rate 5.0000
| end of split 25 / 62 | epoch 3 | time: 1590.33s | valid loss 0.9582 | valid ppl 2.6070 | learning rate 5.0000
| end of split 26 / 62 | epoch 3 | time: 1589.33s | valid loss 0.9580 | valid ppl 2.6065 | learning rate 5.0000
| end of split 27 / 62 | epoch 3 | time: 1589.70s | valid loss 0.9580 | valid ppl 2.6066 | learning rate 5.0000
| end of split 28 / 62 | epoch 3 | time: 1589.72s | valid loss 0.9578 | valid ppl 2.6060 | learning rate 5.0000
| end of split 29 / 62 | epoch 3 | time: 849.01s | valid loss 0.9583 | valid ppl 2.6072 | learning rate 5.0000
| end of split 30 / 62 | epoch 3 | time: 1592.01s | valid loss 0.9576 | valid ppl 2.6055 | learning rate 5.0000
| end of split 31 / 62 | epoch 3 | time: 1593.91s | valid loss 0.9574 | valid ppl 2.6048 | learning rate 5.0000
| end of split 32 / 62 | epoch 3 | time: 1593.53s | valid loss 0.9573 | valid ppl 2.6047 | learning rate 5.0000
| end of split 33 / 62 | epoch 3 | time: 1593.28s | valid loss 0.9573 | valid ppl 2.6047 | learning rate 5.0000
| end of split 34 / 62 | epoch 3 | time: 1592.56s | valid loss 0.9571 | valid ppl 2.6040 | learning rate 5.0000
| end of split 35 / 62 | epoch 3 | time: 1594.00s | valid loss 0.9569 | valid ppl 2.6037 | learning rate 5.0000
| end of split 36 / 62 | epoch 3 | time: 1592.16s | valid loss 0.9580 | valid ppl 2.6064 | learning rate 5.0000
| end of split 37 / 62 | epoch 3 | time: 1593.97s | valid loss 0.9569 | valid ppl 2.6037 | learning rate 5.0000
| end of split 38 / 62 | epoch 3 | time: 1595.62s | valid loss 0.9566 | valid ppl 2.6029 | learning rate 5.0000
| end of split 39 / 62 | epoch 3 | time: 1595.26s | valid loss 0.9565 | valid ppl 2.6025 | learning rate 5.0000
| end of split 40 / 62 | epoch 3 | time: 1595.91s | valid loss 0.9565 | valid ppl 2.6025 | learning rate 5.0000
| end of split 41 / 62 | epoch 3 | time: 1597.34s | valid loss 0.9562 | valid ppl 2.6019 | learning rate 5.0000
| end of split 42 / 62 | epoch 3 | time: 1600.88s | valid loss 0.9561 | valid ppl 2.6015 | learning rate 5.0000
| end of split 43 / 62 | epoch 3 | time: 1601.74s | valid loss 0.9559 | valid ppl 2.6010 | learning rate 5.0000
| end of split 44 / 62 | epoch 3 | time: 1603.40s | valid loss 0.9562 | valid ppl 2.6018 | learning rate 5.0000
| end of split 45 / 62 | epoch 3 | time: 1601.88s | valid loss 0.9557 | valid ppl 2.6004 | learning rate 5.0000
| end of split 46 / 62 | epoch 3 | time: 1602.03s | valid loss 0.9556 | valid ppl 2.6002 | learning rate 5.0000
| end of split 47 / 62 | epoch 3 | time: 1601.98s | valid loss 0.9555 | valid ppl 2.5999 | learning rate 5.0000
| end of split 48 / 62 | epoch 3 | time: 1603.86s | valid loss 0.9555 | valid ppl 2.6001 | learning rate 5.0000
| end of split 49 / 62 | epoch 3 | time: 1600.52s | valid loss 0.9556 | valid ppl 2.6002 | learning rate 5.0000
| end of split 50 / 62 | epoch 3 | time: 1597.63s | valid loss 0.9549 | valid ppl 2.5985 | learning rate 5.0000
| end of split 51 / 62 | epoch 3 | time: 1600.65s | valid loss 0.9550 | valid ppl 2.5987 | learning rate 5.0000
| end of split 52 / 62 | epoch 3 | time: 1599.09s | valid loss 0.9549 | valid ppl 2.5984 | learning rate 5.0000
| end of split 53 / 62 | epoch 3 | time: 1599.84s | valid loss 0.9549 | valid ppl 2.5983 | learning rate 5.0000
| end of split 54 / 62 | epoch 3 | time: 1597.92s | valid loss 0.9547 | valid ppl 2.5980 | learning rate 5.0000
| end of split 55 / 62 | epoch 3 | time: 1598.06s | valid loss 0.9546 | valid ppl 2.5976 | learning rate 5.0000
| end of split 56 / 62 | epoch 3 | time: 1597.08s | valid loss 0.9544 | valid ppl 2.5970 | learning rate 5.0000
| end of split 57 / 62 | epoch 3 | time: 1596.42s | valid loss 0.9544 | valid ppl 2.5971 | learning rate 5.0000
| end of split 58 / 62 | epoch 3 | time: 1597.40s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 5.0000
| end of split 59 / 62 | epoch 3 | time: 1596.76s | valid loss 0.9539 | valid ppl 2.5959 | learning rate 5.0000
| end of split 60 / 62 | epoch 3 | time: 1594.38s | valid loss 0.9540 | valid ppl 2.5962 | learning rate 5.0000
| end of split 61 / 62 | epoch 3 | time: 1595.01s | valid loss 0.9550 | valid ppl 2.5988 | learning rate 5.0000
| end of split 62 / 62 | epoch 3 | time: 1596.06s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 5.0000
| end of split 1 / 62 | epoch 4 | time: 1590.51s | valid loss 0.9539 | valid ppl 2.5959 | learning rate 5.0000
| end of split 2 / 62 | epoch 4 | time: 1594.92s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 5.0000
| end of split 3 / 62 | epoch 4 | time: 1594.53s | valid loss 0.9536 | valid ppl 2.5950 | learning rate 5.0000
| end of split 4 / 62 | epoch 4 | time: 1595.50s | valid loss 0.9534 | valid ppl 2.5946 | learning rate 5.0000
| end of split 5 / 62 | epoch 4 | time: 1594.79s | valid loss 0.9535 | valid ppl 2.5947 | learning rate 5.0000
| end of split 6 / 62 | epoch 4 | time: 1595.23s | valid loss 0.9535 | valid ppl 2.5948 | learning rate 5.0000
| end of split 7 / 62 | epoch 4 | time: 1594.51s | valid loss 0.9535 | valid ppl 2.5948 | learning rate 5.0000
| end of split 8 / 62 | epoch 4 | time: 1595.67s | valid loss 0.9531 | valid ppl 2.5938 | learning rate 5.0000
| end of split 9 / 62 | epoch 4 | time: 1594.19s | valid loss 0.9533 | valid ppl 2.5942 | learning rate 5.0000
| end of split 10 / 62 | epoch 4 | time: 1596.43s | valid loss 0.9530 | valid ppl 2.5935 | learning rate 5.0000
| end of split 11 / 62 | epoch 4 | time: 1594.75s | valid loss 0.9533 | valid ppl 2.5944 | learning rate 5.0000
| end of split 12 / 62 | epoch 4 | time: 1593.83s | valid loss 0.9530 | valid ppl 2.5934 | learning rate 5.0000
| end of split 13 / 62 | epoch 4 | time: 1593.87s | valid loss 0.9530 | valid ppl 2.5934 | learning rate 5.0000
| end of split 14 / 62 | epoch 4 | time: 1595.57s | valid loss 0.9529 | valid ppl 2.5933 | learning rate 5.0000
| end of split 15 / 62 | epoch 4 | time: 1597.27s | valid loss 0.9527 | valid ppl 2.5927 | learning rate 5.0000
| end of split 16 / 62 | epoch 4 | time: 1594.24s | valid loss 0.9526 | valid ppl 2.5924 | learning rate 5.0000
| end of split 17 / 62 | epoch 4 | time: 1594.23s | valid loss 0.9527 | valid ppl 2.5927 | learning rate 5.0000
| end of split 18 / 62 | epoch 4 | time: 1595.12s | valid loss 0.9524 | valid ppl 2.5918 | learning rate 5.0000
| end of split 19 / 62 | epoch 4 | time: 1595.95s | valid loss 0.9524 | valid ppl 2.5920 | learning rate 5.0000
| end of split 20 / 62 | epoch 4 | time: 1594.70s | valid loss 0.9522 | valid ppl 2.5913 | learning rate 5.0000
| end of split 21 / 62 | epoch 4 | time: 1594.57s | valid loss 0.9520 | valid ppl 2.5908 | learning rate 5.0000
| end of split 22 / 62 | epoch 4 | time: 1594.91s | valid loss 0.9520 | valid ppl 2.5908 | learning rate 5.0000
| end of split 23 / 62 | epoch 4 | time: 1594.17s | valid loss 0.9519 | valid ppl 2.5906 | learning rate 5.0000
| end of split 24 / 62 | epoch 4 | time: 1593.85s | valid loss 0.9519 | valid ppl 2.5906 | learning rate 5.0000
| end of split 25 / 62 | epoch 4 | time: 1594.37s | valid loss 0.9519 | valid ppl 2.5907 | learning rate 5.0000
| end of split 26 / 62 | epoch 4 | time: 1595.05s | valid loss 0.9516 | valid ppl 2.5898 | learning rate 5.0000
| end of split 27 / 62 | epoch 4 | time: 1596.66s | valid loss 0.9516 | valid ppl 2.5898 | learning rate 5.0000
| end of split 28 / 62 | epoch 4 | time: 1597.62s | valid loss 0.9522 | valid ppl 2.5915 | learning rate 5.0000
| end of split 29 / 62 | epoch 4 | time: 1596.01s | valid loss 0.9514 | valid ppl 2.5893 | learning rate 5.0000
| end of split 30 / 62 | epoch 4 | time: 1596.94s | valid loss 0.9514 | valid ppl 2.5895 | learning rate 5.0000
| end of split 31 / 62 | epoch 4 | time: 1596.59s | valid loss 0.9515 | valid ppl 2.5895 | learning rate 5.0000
| end of split 32 / 62 | epoch 4 | time: 1594.91s | valid loss 0.9513 | valid ppl 2.5892 | learning rate 5.0000
| end of split 33 / 62 | epoch 4 | time: 1596.39s | valid loss 0.9512 | valid ppl 2.5888 | learning rate 5.0000
| end of split 34 / 62 | epoch 4 | time: 1596.82s | valid loss 0.9512 | valid ppl 2.5888 | learning rate 5.0000
| end of split 35 / 62 | epoch 4 | time: 1597.66s | valid loss 0.9511 | valid ppl 2.5886 | learning rate 5.0000
| end of split 36 / 62 | epoch 4 | time: 1598.20s | valid loss 0.9516 | valid ppl 2.5899 | learning rate 5.0000
| end of split 37 / 62 | epoch 4 | time: 1598.02s | valid loss 0.9510 | valid ppl 2.5883 | learning rate 5.0000
| end of split 38 / 62 | epoch 4 | time: 1597.10s | valid loss 0.9509 | valid ppl 2.5881 | learning rate 5.0000
| end of split 39 / 62 | epoch 4 | time: 1599.56s | valid loss 0.9509 | valid ppl 2.5879 | learning rate 5.0000
| end of split 40 / 62 | epoch 4 | time: 1597.81s | valid loss 0.9510 | valid ppl 2.5882 | learning rate 5.0000
| end of split 41 / 62 | epoch 4 | time: 1598.85s | valid loss 0.9507 | valid ppl 2.5876 | learning rate 5.0000
| end of split 42 / 62 | epoch 4 | time: 1597.13s | valid loss 0.9507 | valid ppl 2.5875 | learning rate 5.0000
| end of split 43 / 62 | epoch 4 | time: 1598.31s | valid loss 0.9508 | valid ppl 2.5877 | learning rate 5.0000
| end of split 44 / 62 | epoch 4 | time: 1597.29s | valid loss 0.9507 | valid ppl 2.5874 | learning rate 5.0000
| end of split 45 / 62 | epoch 4 | time: 1595.76s | valid loss 0.9508 | valid ppl 2.5877 | learning rate 5.0000
| end of split 46 / 62 | epoch 4 | time: 1597.26s | valid loss 0.9506 | valid ppl 2.5872 | learning rate 5.0000
| end of split 47 / 62 | epoch 4 | time: 1596.63s | valid loss 0.9504 | valid ppl 2.5868 | learning rate 5.0000
| end of split 48 / 62 | epoch 4 | time: 1597.06s | valid loss 0.9503 | valid ppl 2.5866 | learning rate 5.0000
| end of split 49 / 62 | epoch 4 | time: 1596.32s | valid loss 0.9501 | valid ppl 2.5860 | learning rate 5.0000
| end of split 50 / 62 | epoch 4 | time: 852.39s | valid loss 0.9507 | valid ppl 2.5876 | learning rate 5.0000
| end of split 51 / 62 | epoch 4 | time: 1596.92s | valid loss 0.9500 | valid ppl 2.5857 | learning rate 5.0000
| end of split 52 / 62 | epoch 4 | time: 1595.75s | valid loss 0.9505 | valid ppl 2.5869 | learning rate 5.0000
| end of split 53 / 62 | epoch 4 | time: 1593.59s | valid loss 0.9501 | valid ppl 2.5858 | learning rate 5.0000
| end of split 54 / 62 | epoch 4 | time: 1594.38s | valid loss 0.9509 | valid ppl 2.5881 | learning rate 5.0000
| end of split 55 / 62 | epoch 4 | time: 1593.89s | valid loss 0.9496 | valid ppl 2.5848 | learning rate 5.0000
| end of split 56 / 62 | epoch 4 | time: 1593.86s | valid loss 0.9499 | valid ppl 2.5854 | learning rate 5.0000
| end of split 57 / 62 | epoch 4 | time: 1592.65s | valid loss 0.9496 | valid ppl 2.5846 | learning rate 5.0000
| end of split 58 / 62 | epoch 4 | time: 1593.43s | valid loss 0.9497 | valid ppl 2.5850 | learning rate 5.0000
| end of split 59 / 62 | epoch 4 | time: 1590.22s | valid loss 0.9496 | valid ppl 2.5846 | learning rate 5.0000
| end of split 60 / 62 | epoch 4 | time: 1592.59s | valid loss 0.9494 | valid ppl 2.5840 | learning rate 5.0000
| end of split 61 / 62 | epoch 4 | time: 1590.49s | valid loss 0.9494 | valid ppl 2.5842 | learning rate 5.0000
| end of split 62 / 62 | epoch 4 | time: 1592.95s | valid loss 0.9494 | valid ppl 2.5841 | learning rate 5.0000
| end of split 1 / 62 | epoch 5 | time: 1588.63s | valid loss 0.9495 | valid ppl 2.5845 | learning rate 5.0000
| end of split 2 / 62 | epoch 5 | time: 1594.59s | valid loss 0.9492 | valid ppl 2.5837 | learning rate 5.0000
| end of split 3 / 62 | epoch 5 | time: 1595.14s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
| end of split 4 / 62 | epoch 5 | time: 1593.00s | valid loss 0.9491 | valid ppl 2.5833 | learning rate 5.0000
| end of split 5 / 62 | epoch 5 | time: 1592.16s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
| end of split 6 / 62 | epoch 5 | time: 1592.38s | valid loss 0.9491 | valid ppl 2.5833 | learning rate 5.0000
| end of split 7 / 62 | epoch 5 | time: 1593.78s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
| end of split 8 / 62 | epoch 5 | time: 1594.50s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
| end of split 9 / 62 | epoch 5 | time: 1594.20s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
| end of split 10 / 62 | epoch 5 | time: 1594.41s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
| end of split 11 / 62 | epoch 5 | time: 1592.91s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
| end of split 12 / 62 | epoch 5 | time: 1595.00s | valid loss 0.9494 | valid ppl 2.5842 | learning rate 5.0000
| end of split 13 / 62 | epoch 5 | time: 1592.84s | valid loss 0.9486 | valid ppl 2.5822 | learning rate 5.0000
| end of split 14 / 62 | epoch 5 | time: 1593.26s | valid loss 0.9485 | valid ppl 2.5819 | learning rate 5.0000
| end of split 15 / 62 | epoch 5 | time: 1592.76s | valid loss 0.9486 | valid ppl 2.5822 | learning rate 5.0000
| end of split 16 / 62 | epoch 5 | time: 1595.66s | valid loss 0.9483 | valid ppl 2.5814 | learning rate 5.0000
| end of split 17 / 62 | epoch 5 | time: 1596.12s | valid loss 0.9484 | valid ppl 2.5816 | learning rate 5.0000
| end of split 18 / 62 | epoch 5 | time: 1597.15s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
| end of split 19 / 62 | epoch 5 | time: 1595.50s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
| end of split 20 / 62 | epoch 5 | time: 1597.42s | valid loss 0.9482 | valid ppl 2.5812 | learning rate 5.0000
| end of split 21 / 62 | epoch 5 | time: 1596.20s | valid loss 0.9483 | valid ppl 2.5814 | learning rate 5.0000
| end of split 22 / 62 | epoch 5 | time: 1597.06s | valid loss 0.9479 | valid ppl 2.5804 | learning rate 5.0000
| end of split 23 / 62 | epoch 5 | time: 1596.92s | valid loss 0.9479 | valid ppl 2.5803 | learning rate 5.0000
| end of split 24 / 62 | epoch 5 | time: 1593.52s | valid loss 0.9481 | valid ppl 2.5807 | learning rate 5.0000
| end of split 25 / 62 | epoch 5 | time: 1595.12s | valid loss 0.9480 | valid ppl 2.5805 | learning rate 5.0000
| end of split 26 / 62 | epoch 5 | time: 1595.25s | valid loss 0.9479 | valid ppl 2.5802 | learning rate 5.0000
| end of split 27 / 62 | epoch 5 | time: 1644.92s | valid loss 0.9477 | valid ppl 2.5799 | learning rate 5.0000
| end of split 28 / 62 | epoch 5 | time: 1595.94s | valid loss 0.9478 | valid ppl 2.5801 | learning rate 5.0000
| end of split 29 / 62 | epoch 5 | time: 1596.39s | valid loss 0.9489 | valid ppl 2.5830 | learning rate 5.0000
| end of split 30 / 62 | epoch 5 | time: 1596.48s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
| end of split 31 / 62 | epoch 5 | time: 1594.94s | valid loss 0.9480 | valid ppl 2.5805 | learning rate 5.0000
| end of split 32 / 62 | epoch 5 | time: 1596.25s | valid loss 0.9477 | valid ppl 2.5799 | learning rate 5.0000
| end of split 33 / 62 | epoch 5 | time: 1595.95s | valid loss 0.9476 | valid ppl 2.5795 | learning rate 5.0000
| end of split 34 / 62 | epoch 5 | time: 1594.31s | valid loss 0.9474 | valid ppl 2.5791 | learning rate 5.0000
| end of split 35 / 62 | epoch 5 | time: 1595.73s | valid loss 0.9475 | valid ppl 2.5792 | learning rate 5.0000
| end of split 36 / 62 | epoch 5 | time: 1593.93s | valid loss 0.9476 | valid ppl 2.5794 | learning rate 5.0000
| end of split 37 / 62 | epoch 5 | time: 1594.50s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
| end of split 38 / 62 | epoch 5 | time: 1592.84s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
| end of split 39 / 62 | epoch 5 | time: 1591.33s | valid loss 0.9473 | valid ppl 2.5788 | learning rate 5.0000
| end of split 40 / 62 | epoch 5 | time: 1590.07s | valid loss 0.9471 | valid ppl 2.5783 | learning rate 5.0000
| end of split 41 / 62 | epoch 5 | time: 1591.27s | valid loss 0.9474 | valid ppl 2.5791 | learning rate 5.0000
| end of split 42 / 62 | epoch 5 | time: 1590.29s | valid loss 0.9471 | valid ppl 2.5782 | learning rate 5.0000
| end of split 43 / 62 | epoch 5 | time: 1590.07s | valid loss 0.9470 | valid ppl 2.5780 | learning rate 5.0000
| end of split 44 / 62 | epoch 5 | time: 1590.49s | valid loss 0.9471 | valid ppl 2.5781 | learning rate 5.0000
| end of split 45 / 62 | epoch 5 | time: 1589.80s | valid loss 0.9473 | valid ppl 2.5787 | learning rate 5.0000
| end of split 46 / 62 | epoch 5 | time: 1588.77s | valid loss 0.9470 | valid ppl 2.5779 | learning rate 5.0000
| end of split 47 / 62 | epoch 5 | time: 1589.22s | valid loss 0.9468 | valid ppl 2.5773 | learning rate 5.0000
| end of split 48 / 62 | epoch 5 | time: 1590.14s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
| end of split 49 / 62 | epoch 5 | time: 1587.40s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
| end of split 50 / 62 | epoch 5 | time: 847.83s | valid loss 0.9472 | valid ppl 2.5786 | learning rate 5.0000
| end of split 51 / 62 | epoch 5 | time: 1588.35s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
| end of split 52 / 62 | epoch 5 | time: 1587.80s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
| end of split 53 / 62 | epoch 5 | time: 1588.01s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
| end of split 54 / 62 | epoch 5 | time: 1585.93s | valid loss 0.9465 | valid ppl 2.5767 | learning rate 5.0000
| end of split 55 / 62 | epoch 5 | time: 1584.78s | valid loss 0.9463 | valid ppl 2.5763 | learning rate 5.0000
| end of split 56 / 62 | epoch 5 | time: 1585.77s | valid loss 0.9481 | valid ppl 2.5808 | learning rate 5.0000
| end of split 57 / 62 | epoch 5 | time: 1586.16s | valid loss 0.9465 | valid ppl 2.5766 | learning rate 5.0000
| end of split 58 / 62 | epoch 5 | time: 1586.35s | valid loss 0.9464 | valid ppl 2.5765 | learning rate 5.0000
| end of split 59 / 62 | epoch 5 | time: 1585.15s | valid loss 0.9463 | valid ppl 2.5762 | learning rate 5.0000
| end of split 60 / 62 | epoch 5 | time: 1585.41s | valid loss 0.9473 | valid ppl 2.5788 | learning rate 5.0000
| end of split 61 / 62 | epoch 5 | time: 1586.84s | valid loss 0.9462 | valid ppl 2.5760 | learning rate 5.0000
| end of split 62 / 62 | epoch 5 | time: 1585.85s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
| end of split 1 / 62 | epoch 6 | time: 1580.81s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
| end of split 2 / 62 | epoch 6 | time: 1585.96s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
| end of split 3 / 62 | epoch 6 | time: 1586.43s | valid loss 0.9461 | valid ppl 2.5757 | learning rate 5.0000
| end of split 4 / 62 | epoch 6 | time: 1591.11s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
| end of split 5 / 62 | epoch 6 | time: 1593.60s | valid loss 0.9458 | valid ppl 2.5749 | learning rate 5.0000
| end of split 6 / 62 | epoch 6 | time: 1594.82s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
| end of split 7 / 62 | epoch 6 | time: 1599.91s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
| end of split 8 / 62 | epoch 6 | time: 1601.71s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
| end of split 9 / 62 | epoch 6 | time: 1597.62s | valid loss 0.9458 | valid ppl 2.5747 | learning rate 5.0000
| end of split 10 / 62 | epoch 6 | time: 1600.06s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
| end of split 11 / 62 | epoch 6 | time: 1596.53s | valid loss 0.9455 | valid ppl 2.5740 | learning rate 5.0000
| end of split 12 / 62 | epoch 6 | time: 1599.04s | valid loss 0.9456 | valid ppl 2.5745 | learning rate 5.0000
| end of split 13 / 62 | epoch 6 | time: 1593.55s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
| end of split 14 / 62 | epoch 6 | time: 1596.25s | valid loss 0.9454 | valid ppl 2.5740 | learning rate 5.0000
| end of split 15 / 62 | epoch 6 | time: 1595.15s | valid loss 0.9454 | valid ppl 2.5740 | learning rate 5.0000
| end of split 16 / 62 | epoch 6 | time: 1595.84s | valid loss 0.9454 | valid ppl 2.5738 | learning rate 5.0000
| end of split 17 / 62 | epoch 6 | time: 1597.05s | valid loss 0.9453 | valid ppl 2.5737 | learning rate 5.0000
| end of split 18 / 62 | epoch 6 | time: 1595.68s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
| end of split 19 / 62 | epoch 6 | time: 1595.81s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
| end of split 20 / 62 | epoch 6 | time: 1596.74s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
| end of split 21 / 62 | epoch 6 | time: 1596.50s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
| end of split 22 / 62 | epoch 6 | time: 1596.57s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
| end of split 23 / 62 | epoch 6 | time: 1597.51s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
| end of split 24 / 62 | epoch 6 | time: 1597.85s | valid loss 0.9453 | valid ppl 2.5735 | learning rate 5.0000
| end of split 25 / 62 | epoch 6 | time: 1595.58s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
| end of split 26 / 62 | epoch 6 | time: 1599.43s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
| end of split 27 / 62 | epoch 6 | time: 1625.16s | valid loss 0.9454 | valid ppl 2.5737 | learning rate 5.0000
| end of split 28 / 62 | epoch 6 | time: 1677.11s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
| end of split 29 / 62 | epoch 6 | time: 1664.87s | valid loss 0.9500 | valid ppl 2.5857 | learning rate 5.0000
| end of split 30 / 62 | epoch 6 | time: 1610.42s | valid loss 0.9491 | valid ppl 2.5834 | learning rate 5.0000
| end of split 31 / 62 | epoch 6 | time: 1613.54s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
| end of split 32 / 62 | epoch 6 | time: 1616.62s | valid loss 0.9463 | valid ppl 2.5762 | learning rate 5.0000
| end of split 33 / 62 | epoch 6 | time: 1619.63s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
| end of split 34 / 62 | epoch 6 | time: 1617.77s | valid loss 0.9452 | valid ppl 2.5735 | learning rate 5.0000
| end of split 35 / 62 | epoch 6 | time: 1616.49s | valid loss 0.9447 | valid ppl 2.5720 | learning rate 1.2500
| end of split 36 / 62 | epoch 6 | time: 1617.61s | valid loss 0.9443 | valid ppl 2.5711 | learning rate 1.2500
| end of split 37 / 62 | epoch 6 | time: 1619.28s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 1.2500
| end of split 38 / 62 | epoch 6 | time: 1620.03s | valid loss 0.9439 | valid ppl 2.5700 | learning rate 1.2500
| end of split 39 / 62 | epoch 6 | time: 1621.32s | valid loss 0.9438 | valid ppl 2.5698 | learning rate 1.2500
| end of split 40 / 62 | epoch 6 | time: 1625.63s | valid loss 0.9437 | valid ppl 2.5695 | learning rate 1.2500
| end of split 41 / 62 | epoch 6 | time: 1625.86s | valid loss 0.9437 | valid ppl 2.5696 | learning rate 1.2500
| end of split 42 / 62 | epoch 6 | time: 1625.70s | valid loss 0.9436 | valid ppl 2.5692 | learning rate 1.2500
| end of split 43 / 62 | epoch 6 | time: 1629.22s | valid loss 0.9436 | valid ppl 2.5691 | learning rate 1.2500
| end of split 44 / 62 | epoch 6 | time: 1628.58s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
| end of split 45 / 62 | epoch 6 | time: 870.27s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
| end of split 46 / 62 | epoch 6 | time: 1629.99s | valid loss 0.9434 | valid ppl 2.5688 | learning rate 1.2500
| end of split 47 / 62 | epoch 6 | time: 1629.90s | valid loss 0.9435 | valid ppl 2.5689 | learning rate 1.2500
| end of split 48 / 62 | epoch 6 | time: 1628.52s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
| end of split 49 / 62 | epoch 6 | time: 1631.93s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
| end of split 50 / 62 | epoch 6 | time: 1627.56s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
| end of split 51 / 62 | epoch 6 | time: 1628.79s | valid loss 0.9433 | valid ppl 2.5683 | learning rate 1.2500
| end of split 52 / 62 | epoch 6 | time: 1630.13s | valid loss 0.9434 | valid ppl 2.5686 | learning rate 1.2500
| end of split 53 / 62 | epoch 6 | time: 1630.48s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
| end of split 54 / 62 | epoch 6 | time: 1629.97s | valid loss 0.9432 | valid ppl 2.5681 | learning rate 1.2500
| end of split 55 / 62 | epoch 6 | time: 1622.82s | valid loss 0.9432 | valid ppl 2.5682 | learning rate 1.2500
| end of split 56 / 62 | epoch 6 | time: 1624.52s | valid loss 0.9431 | valid ppl 2.5680 | learning rate 1.2500
| end of split 57 / 62 | epoch 6 | time: 1626.41s | valid loss 0.9431 | valid ppl 2.5679 | learning rate 1.2500
| end of split 58 / 62 | epoch 6 | time: 1625.56s | valid loss 0.9434 | valid ppl 2.5686 | learning rate 1.2500
| end of split 59 / 62 | epoch 6 | time: 1627.15s | valid loss 0.9431 | valid ppl 2.5678 | learning rate 1.2500
| end of split 60 / 62 | epoch 6 | time: 1627.44s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
| end of split 61 / 62 | epoch 6 | time: 1627.57s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
| end of split 62 / 62 | epoch 6 | time: 1625.18s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
| end of split 1 / 62 | epoch 7 | time: 1620.40s | valid loss 0.9429 | valid ppl 2.5675 | learning rate 1.2500
| end of split 2 / 62 | epoch 7 | time: 1627.79s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
| end of split 3 / 62 | epoch 7 | time: 1627.64s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
| end of split 4 / 62 | epoch 7 | time: 1626.87s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
| end of split 5 / 62 | epoch 7 | time: 1628.51s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
| end of split 6 / 62 | epoch 7 | time: 1627.38s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
| end of split 7 / 62 | epoch 7 | time: 1624.51s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
| end of split 8 / 62 | epoch 7 | time: 1622.62s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
| end of split 9 / 62 | epoch 7 | time: 1624.24s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
| end of split 10 / 62 | epoch 7 | time: 1625.57s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
| end of split 11 / 62 | epoch 7 | time: 1625.67s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
| end of split 12 / 62 | epoch 7 | time: 1716.44s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
| end of split 13 / 62 | epoch 7 | time: 1794.58s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 14 / 62 | epoch 7 | time: 1783.52s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
| end of split 15 / 62 | epoch 7 | time: 1769.46s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 16 / 62 | epoch 7 | time: 1775.92s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 17 / 62 | epoch 7 | time: 1777.89s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 18 / 62 | epoch 7 | time: 1783.47s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 19 / 62 | epoch 7 | time: 1779.88s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 20 / 62 | epoch 7 | time: 1763.54s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 21 / 62 | epoch 7 | time: 1772.71s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 22 / 62 | epoch 7 | time: 1775.60s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 23 / 62 | epoch 7 | time: 1782.51s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
| end of split 24 / 62 | epoch 7 | time: 1754.16s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
| end of split 25 / 62 | epoch 7 | time: 941.64s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
| end of split 26 / 62 | epoch 7 | time: 1763.95s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
| end of split 27 / 62 | epoch 7 | time: 1776.44s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
| end of split 28 / 62 | epoch 7 | time: 1768.74s | valid loss 0.9426 | valid ppl 2.5665 | learning rate 1.2500
| end of split 29 / 62 | epoch 7 | time: 1800.52s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
| end of split 30 / 62 | epoch 7 | time: 1815.90s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
| end of split 31 / 62 | epoch 7 | time: 1745.49s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
| end of split 32 / 62 | epoch 7 | time: 1613.56s | valid loss 0.9425 | valid ppl 2.5664 | learning rate 1.2500
| end of split 33 / 62 | epoch 7 | time: 1628.29s | valid loss 0.9425 | valid ppl 2.5665 | learning rate 1.2500
| end of split 34 / 62 | epoch 7 | time: 1624.90s | valid loss 0.9425 | valid ppl 2.5663 | learning rate 1.2500
| end of split 35 / 62 | epoch 7 | time: 1626.26s | valid loss 0.9425 | valid ppl 2.5664 | learning rate 1.2500
| end of split 36 / 62 | epoch 7 | time: 1603.86s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 1.2500
| end of split 37 / 62 | epoch 7 | time: 1605.85s | valid loss 0.9424 | valid ppl 2.5663 | learning rate 1.2500
| end of split 38 / 62 | epoch 7 | time: 1603.91s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
| end of split 39 / 62 | epoch 7 | time: 1605.22s | valid loss 0.9425 | valid ppl 2.5663 | learning rate 1.2500
| end of split 40 / 62 | epoch 7 | time: 1602.75s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
| end of split 41 / 62 | epoch 7 | time: 1604.28s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 1.2500
| end of split 42 / 62 | epoch 7 | time: 1603.89s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
| end of split 43 / 62 | epoch 7 | time: 1603.60s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
| end of split 44 / 62 | epoch 7 | time: 1606.62s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
| end of split 45 / 62 | epoch 7 | time: 1604.77s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 1.2500
| end of split 46 / 62 | epoch 7 | time: 1603.10s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.3125
| end of split 47 / 62 | epoch 7 | time: 1601.62s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.3125
| end of split 48 / 62 | epoch 7 | time: 1604.55s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.3125
| end of split 49 / 62 | epoch 7 | time: 1604.48s | valid loss 0.9421 | valid ppl 2.5654 | learning rate 0.3125
| end of split 50 / 62 | epoch 7 | time: 1603.34s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
| end of split 51 / 62 | epoch 7 | time: 1600.92s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
| end of split 52 / 62 | epoch 7 | time: 1604.70s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
| end of split 53 / 62 | epoch 7 | time: 1603.28s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
| end of split 54 / 62 | epoch 7 | time: 1610.64s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
| end of split 55 / 62 | epoch 7 | time: 1605.28s | valid loss 0.9421 | valid ppl 2.5652 | learning rate 0.3125
| end of split 56 / 62 | epoch 7 | time: 1603.78s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
| end of split 57 / 62 | epoch 7 | time: 1603.91s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
| end of split 58 / 62 | epoch 7 | time: 1605.53s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
| end of split 59 / 62 | epoch 7 | time: 1656.75s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
| end of split 60 / 62 | epoch 7 | time: 1603.18s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
| end of split 61 / 62 | epoch 7 | time: 1601.58s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
| end of split 62 / 62 | epoch 7 | time: 1602.32s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
| end of split 1 / 62 | epoch 8 | time: 1599.87s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
| end of split 2 / 62 | epoch 8 | time: 1605.15s | valid loss 0.9420 | valid ppl 2.5650 | learning rate 0.3125
| end of split 3 / 62 | epoch 8 | time: 1604.62s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
| end of split 4 / 62 | epoch 8 | time: 1604.72s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
| end of split 5 / 62 | epoch 8 | time: 1637.47s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
| end of split 6 / 62 | epoch 8 | time: 875.65s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
| end of split 7 / 62 | epoch 8 | time: 1638.44s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
| end of split 8 / 62 | epoch 8 | time: 1612.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 9 / 62 | epoch 8 | time: 1621.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 10 / 62 | epoch 8 | time: 1640.27s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 11 / 62 | epoch 8 | time: 1640.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 12 / 62 | epoch 8 | time: 1611.55s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 13 / 62 | epoch 8 | time: 1608.16s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 14 / 62 | epoch 8 | time: 1663.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 15 / 62 | epoch 8 | time: 1668.15s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 16 / 62 | epoch 8 | time: 1652.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 17 / 62 | epoch 8 | time: 1614.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
| end of split 18 / 62 | epoch 8 | time: 1617.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 19 / 62 | epoch 8 | time: 1628.04s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 20 / 62 | epoch 8 | time: 1624.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 21 / 62 | epoch 8 | time: 1637.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 22 / 62 | epoch 8 | time: 1634.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 23 / 62 | epoch 8 | time: 1620.99s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 24 / 62 | epoch 8 | time: 1616.31s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 25 / 62 | epoch 8 | time: 1611.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 26 / 62 | epoch 8 | time: 1605.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 27 / 62 | epoch 8 | time: 1607.65s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 28 / 62 | epoch 8 | time: 1608.79s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
| end of split 29 / 62 | epoch 8 | time: 1608.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 30 / 62 | epoch 8 | time: 1612.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 31 / 62 | epoch 8 | time: 1612.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 32 / 62 | epoch 8 | time: 1605.76s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 33 / 62 | epoch 8 | time: 1609.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 34 / 62 | epoch 8 | time: 1611.85s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
| end of split 35 / 62 | epoch 8 | time: 1620.65s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 36 / 62 | epoch 8 | time: 1619.16s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 37 / 62 | epoch 8 | time: 1604.30s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
| end of split 38 / 62 | epoch 8 | time: 1605.41s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
| end of split 39 / 62 | epoch 8 | time: 1639.13s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
| end of split 40 / 62 | epoch 8 | time: 1614.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 41 / 62 | epoch 8 | time: 1619.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 42 / 62 | epoch 8 | time: 1655.86s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0012
| end of split 43 / 62 | epoch 8 | time: 1652.46s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 44 / 62 | epoch 8 | time: 1622.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 45 / 62 | epoch 8 | time: 1623.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 46 / 62 | epoch 8 | time: 1621.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 47 / 62 | epoch 8 | time: 1619.93s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 48 / 62 | epoch 8 | time: 1626.00s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 49 / 62 | epoch 8 | time: 1619.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 50 / 62 | epoch 8 | time: 1619.02s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
| end of split 51 / 62 | epoch 8 | time: 1670.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 52 / 62 | epoch 8 | time: 1671.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 53 / 62 | epoch 8 | time: 1675.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 54 / 62 | epoch 8 | time: 1674.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 55 / 62 | epoch 8 | time: 1662.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 56 / 62 | epoch 8 | time: 1656.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 57 / 62 | epoch 8 | time: 1656.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 58 / 62 | epoch 8 | time: 1656.95s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 59 / 62 | epoch 8 | time: 1650.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 60 / 62 | epoch 8 | time: 1621.00s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 61 / 62 | epoch 8 | time: 1621.93s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
| end of split 62 / 62 | epoch 8 | time: 1619.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 1 / 62 | epoch 9 | time: 1615.83s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 2 / 62 | epoch 9 | time: 1665.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 3 / 62 | epoch 9 | time: 1619.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 4 / 62 | epoch 9 | time: 1618.02s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 5 / 62 | epoch 9 | time: 1615.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 6 / 62 | epoch 9 | time: 1617.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 7 / 62 | epoch 9 | time: 1613.72s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 8 / 62 | epoch 9 | time: 1617.41s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 9 / 62 | epoch 9 | time: 1609.69s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 10 / 62 | epoch 9 | time: 1608.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
| end of split 11 / 62 | epoch 9 | time: 1619.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 12 / 62 | epoch 9 | time: 1616.51s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 13 / 62 | epoch 9 | time: 1611.11s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 14 / 62 | epoch 9 | time: 1609.59s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 15 / 62 | epoch 9 | time: 1609.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 16 / 62 | epoch 9 | time: 1609.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 17 / 62 | epoch 9 | time: 1610.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 18 / 62 | epoch 9 | time: 1605.49s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 19 / 62 | epoch 9 | time: 1609.29s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 20 / 62 | epoch 9 | time: 1610.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 21 / 62 | epoch 9 | time: 1610.08s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 22 / 62 | epoch 9 | time: 1609.18s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 23 / 62 | epoch 9 | time: 1608.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 24 / 62 | epoch 9 | time: 1609.79s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 25 / 62 | epoch 9 | time: 1608.82s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 26 / 62 | epoch 9 | time: 1609.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 27 / 62 | epoch 9 | time: 1611.33s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 28 / 62 | epoch 9 | time: 1612.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 29 / 62 | epoch 9 | time: 1611.11s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 30 / 62 | epoch 9 | time: 1612.06s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 31 / 62 | epoch 9 | time: 1609.92s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 32 / 62 | epoch 9 | time: 1606.74s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 33 / 62 | epoch 9 | time: 1609.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 34 / 62 | epoch 9 | time: 1610.44s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 35 / 62 | epoch 9 | time: 1613.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 36 / 62 | epoch 9 | time: 1614.50s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 37 / 62 | epoch 9 | time: 1612.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 38 / 62 | epoch 9 | time: 1614.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 39 / 62 | epoch 9 | time: 1616.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 40 / 62 | epoch 9 | time: 1618.87s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 41 / 62 | epoch 9 | time: 1616.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 42 / 62 | epoch 9 | time: 1590.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 43 / 62 | epoch 9 | time: 1588.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 44 / 62 | epoch 9 | time: 1587.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 45 / 62 | epoch 9 | time: 1588.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 46 / 62 | epoch 9 | time: 1599.34s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 47 / 62 | epoch 9 | time: 1601.18s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 48 / 62 | epoch 9 | time: 1601.25s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 49 / 62 | epoch 9 | time: 1602.68s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 50 / 62 | epoch 9 | time: 1601.60s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 51 / 62 | epoch 9 | time: 855.74s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 52 / 62 | epoch 9 | time: 1601.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 53 / 62 | epoch 9 | time: 1600.52s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 54 / 62 | epoch 9 | time: 1596.97s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 55 / 62 | epoch 9 | time: 1594.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 56 / 62 | epoch 9 | time: 1587.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 57 / 62 | epoch 9 | time: 1603.26s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 58 / 62 | epoch 9 | time: 1616.94s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 59 / 62 | epoch 9 | time: 1616.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 60 / 62 | epoch 9 | time: 1618.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 61 / 62 | epoch 9 | time: 1617.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 62 / 62 | epoch 9 | time: 1618.64s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 1 / 62 | epoch 10 | time: 1611.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 2 / 62 | epoch 10 | time: 1613.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 3 / 62 | epoch 10 | time: 1612.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 4 / 62 | epoch 10 | time: 1616.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 5 / 62 | epoch 10 | time: 1614.61s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 6 / 62 | epoch 10 | time: 1616.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 7 / 62 | epoch 10 | time: 1614.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 8 / 62 | epoch 10 | time: 1616.10s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 9 / 62 | epoch 10 | time: 1617.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 10 / 62 | epoch 10 | time: 1616.50s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 11 / 62 | epoch 10 | time: 1614.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 12 / 62 | epoch 10 | time: 1616.48s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 13 / 62 | epoch 10 | time: 1614.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 14 / 62 | epoch 10 | time: 1616.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 15 / 62 | epoch 10 | time: 1617.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 16 / 62 | epoch 10 | time: 1617.48s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 17 / 62 | epoch 10 | time: 1617.70s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 18 / 62 | epoch 10 | time: 1616.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 19 / 62 | epoch 10 | time: 1615.61s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 20 / 62 | epoch 10 | time: 1616.89s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 21 / 62 | epoch 10 | time: 1617.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 22 / 62 | epoch 10 | time: 1615.66s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 23 / 62 | epoch 10 | time: 1617.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 24 / 62 | epoch 10 | time: 1619.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 25 / 62 | epoch 10 | time: 1621.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 26 / 62 | epoch 10 | time: 1619.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 27 / 62 | epoch 10 | time: 1620.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 28 / 62 | epoch 10 | time: 1622.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 29 / 62 | epoch 10 | time: 1624.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 30 / 62 | epoch 10 | time: 1625.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 31 / 62 | epoch 10 | time: 1621.94s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 32 / 62 | epoch 10 | time: 1628.25s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 33 / 62 | epoch 10 | time: 1629.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 34 / 62 | epoch 10 | time: 1630.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 35 / 62 | epoch 10 | time: 1631.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 36 / 62 | epoch 10 | time: 1631.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 37 / 62 | epoch 10 | time: 1634.08s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 38 / 62 | epoch 10 | time: 1634.51s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 39 / 62 | epoch 10 | time: 1634.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
| end of split 40 / 62 | epoch 10 | time: 1631.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
TEST: valid loss 0.9404 | valid ppl 2.5611