Dmitry Chaplinsky
Release
9c2cbc6
| end of split 1 / 62 | epoch 1 | time: 1583.48s | valid loss 1.4195 | valid ppl 4.1349 | learning rate 20.0000
| end of split 2 / 62 | epoch 1 | time: 1586.99s | valid loss 1.2706 | valid ppl 3.5628 | learning rate 20.0000
| end of split 3 / 62 | epoch 1 | time: 1587.17s | valid loss 1.2056 | valid ppl 3.3386 | learning rate 20.0000
| end of split 4 / 62 | epoch 1 | time: 1588.13s | valid loss 1.1661 | valid ppl 3.2093 | learning rate 20.0000
| end of split 5 / 62 | epoch 1 | time: 1588.33s | valid loss 1.1408 | valid ppl 3.1294 | learning rate 20.0000
| end of split 6 / 62 | epoch 1 | time: 1587.62s | valid loss 1.1212 | valid ppl 3.0685 | learning rate 20.0000
| end of split 7 / 62 | epoch 1 | time: 1587.56s | valid loss 1.1058 | valid ppl 3.0217 | learning rate 20.0000
| end of split 8 / 62 | epoch 1 | time: 1588.13s | valid loss 1.0983 | valid ppl 2.9990 | learning rate 20.0000
| end of split 9 / 62 | epoch 1 | time: 1586.70s | valid loss 1.0876 | valid ppl 2.9671 | learning rate 20.0000
| end of split 10 / 62 | epoch 1 | time: 1585.61s | valid loss 1.0829 | valid ppl 2.9534 | learning rate 20.0000
| end of split 11 / 62 | epoch 1 | time: 1585.50s | valid loss 1.0744 | valid ppl 2.9282 | learning rate 20.0000
| end of split 12 / 62 | epoch 1 | time: 1583.26s | valid loss 1.0666 | valid ppl 2.9055 | learning rate 20.0000
| end of split 13 / 62 | epoch 1 | time: 1584.36s | valid loss 1.0616 | valid ppl 2.8911 | learning rate 20.0000
| end of split 14 / 62 | epoch 1 | time: 1585.50s | valid loss 1.0568 | valid ppl 2.8771 | learning rate 20.0000
| end of split 15 / 62 | epoch 1 | time: 1586.30s | valid loss 1.1435 | valid ppl 3.1378 | learning rate 20.0000
| end of split 16 / 62 | epoch 1 | time: 1590.72s | valid loss 1.0505 | valid ppl 2.8592 | learning rate 20.0000
| end of split 17 / 62 | epoch 1 | time: 1617.21s | valid loss 1.0468 | valid ppl 2.8484 | learning rate 20.0000
| end of split 18 / 62 | epoch 1 | time: 1606.50s | valid loss 1.0429 | valid ppl 2.8374 | learning rate 20.0000
| end of split 19 / 62 | epoch 1 | time: 1600.44s | valid loss 1.0395 | valid ppl 2.8278 | learning rate 20.0000
| end of split 20 / 62 | epoch 1 | time: 1593.91s | valid loss 1.0392 | valid ppl 2.8268 | learning rate 20.0000
| end of split 21 / 62 | epoch 1 | time: 1607.71s | valid loss 1.0325 | valid ppl 2.8081 | learning rate 20.0000
| end of split 22 / 62 | epoch 1 | time: 1603.04s | valid loss 1.0321 | valid ppl 2.8070 | learning rate 20.0000
| end of split 23 / 62 | epoch 1 | time: 1602.89s | valid loss 1.0292 | valid ppl 2.7988 | learning rate 20.0000
| end of split 24 / 62 | epoch 1 | time: 1606.15s | valid loss 1.0284 | valid ppl 2.7965 | learning rate 20.0000
| end of split 25 / 62 | epoch 1 | time: 1583.05s | valid loss 1.0251 | valid ppl 2.7874 | learning rate 20.0000
| end of split 26 / 62 | epoch 1 | time: 1580.57s | valid loss 1.0232 | valid ppl 2.7820 | learning rate 20.0000
| end of split 27 / 62 | epoch 1 | time: 1578.17s | valid loss 1.0218 | valid ppl 2.7783 | learning rate 20.0000
| end of split 28 / 62 | epoch 1 | time: 1577.71s | valid loss 1.0200 | valid ppl 2.7732 | learning rate 20.0000
| end of split 29 / 62 | epoch 1 | time: 1577.12s | valid loss 1.0258 | valid ppl 2.7895 | learning rate 20.0000
| end of split 30 / 62 | epoch 1 | time: 1577.09s | valid loss 1.0195 | valid ppl 2.7719 | learning rate 20.0000
| end of split 31 / 62 | epoch 1 | time: 1575.70s | valid loss 1.0191 | valid ppl 2.7706 | learning rate 20.0000
| end of split 32 / 62 | epoch 1 | time: 1576.02s | valid loss 1.0141 | valid ppl 2.7570 | learning rate 20.0000
| end of split 33 / 62 | epoch 1 | time: 1575.11s | valid loss 1.0111 | valid ppl 2.7486 | learning rate 20.0000
| end of split 34 / 62 | epoch 1 | time: 1574.68s | valid loss 1.0315 | valid ppl 2.8053 | learning rate 20.0000
| end of split 35 / 62 | epoch 1 | time: 1575.54s | valid loss 1.0103 | valid ppl 2.7463 | learning rate 20.0000
| end of split 36 / 62 | epoch 1 | time: 1578.17s | valid loss 1.0089 | valid ppl 2.7425 | learning rate 20.0000
| end of split 37 / 62 | epoch 1 | time: 1581.60s | valid loss 1.0098 | valid ppl 2.7450 | learning rate 20.0000
| end of split 38 / 62 | epoch 1 | time: 1590.23s | valid loss 1.0059 | valid ppl 2.7345 | learning rate 20.0000
| end of split 39 / 62 | epoch 1 | time: 1591.84s | valid loss 1.0313 | valid ppl 2.8048 | learning rate 20.0000
| end of split 40 / 62 | epoch 1 | time: 1592.79s | valid loss 1.0059 | valid ppl 2.7344 | learning rate 20.0000
| end of split 41 / 62 | epoch 1 | time: 1591.62s | valid loss 1.0026 | valid ppl 2.7253 | learning rate 20.0000
| end of split 42 / 62 | epoch 1 | time: 1611.75s | valid loss 1.0035 | valid ppl 2.7277 | learning rate 20.0000
| end of split 43 / 62 | epoch 1 | time: 1618.56s | valid loss 1.0010 | valid ppl 2.7210 | learning rate 20.0000
| end of split 44 / 62 | epoch 1 | time: 1623.11s | valid loss 1.0031 | valid ppl 2.7267 | learning rate 20.0000
| end of split 45 / 62 | epoch 1 | time: 1624.39s | valid loss 0.9990 | valid ppl 2.7156 | learning rate 20.0000
| end of split 46 / 62 | epoch 1 | time: 1627.72s | valid loss 0.9990 | valid ppl 2.7157 | learning rate 20.0000
| end of split 47 / 62 | epoch 1 | time: 1627.58s | valid loss 1.0122 | valid ppl 2.7516 | learning rate 20.0000
| end of split 48 / 62 | epoch 1 | time: 1626.44s | valid loss 0.9964 | valid ppl 2.7084 | learning rate 20.0000
| end of split 49 / 62 | epoch 1 | time: 1625.87s | valid loss 0.9977 | valid ppl 2.7120 | learning rate 20.0000
| end of split 50 / 62 | epoch 1 | time: 1626.88s | valid loss 0.9963 | valid ppl 2.7082 | learning rate 20.0000
| end of split 51 / 62 | epoch 1 | time: 1629.08s | valid loss 0.9958 | valid ppl 2.7069 | learning rate 20.0000
| end of split 52 / 62 | epoch 1 | time: 1629.12s | valid loss 1.0030 | valid ppl 2.7264 | learning rate 20.0000
| end of split 53 / 62 | epoch 1 | time: 1628.87s | valid loss 0.9934 | valid ppl 2.7005 | learning rate 20.0000
| end of split 54 / 62 | epoch 1 | time: 1629.78s | valid loss 0.9930 | valid ppl 2.6994 | learning rate 20.0000
| end of split 55 / 62 | epoch 1 | time: 1628.40s | valid loss 0.9921 | valid ppl 2.6968 | learning rate 20.0000
| end of split 56 / 62 | epoch 1 | time: 1626.37s | valid loss 0.9927 | valid ppl 2.6984 | learning rate 20.0000
| end of split 57 / 62 | epoch 1 | time: 1627.36s | valid loss 0.9918 | valid ppl 2.6961 | learning rate 20.0000
| end of split 58 / 62 | epoch 1 | time: 1625.21s | valid loss 0.9900 | valid ppl 2.6912 | learning rate 20.0000
| end of split 59 / 62 | epoch 1 | time: 1626.91s | valid loss 0.9888 | valid ppl 2.6880 | learning rate 20.0000
| end of split 60 / 62 | epoch 1 | time: 1627.73s | valid loss 0.9964 | valid ppl 2.7086 | learning rate 20.0000
| end of split 61 / 62 | epoch 1 | time: 1626.02s | valid loss 0.9890 | valid ppl 2.6886 | learning rate 20.0000
| end of split 62 / 62 | epoch 1 | time: 869.09s | valid loss 0.9974 | valid ppl 2.7112 | learning rate 20.0000
| end of split 1 / 62 | epoch 2 | time: 1622.25s | valid loss 0.9901 | valid ppl 2.6916 | learning rate 20.0000
| end of split 2 / 62 | epoch 2 | time: 1625.45s | valid loss 0.9873 | valid ppl 2.6839 | learning rate 20.0000
| end of split 3 / 62 | epoch 2 | time: 1623.22s | valid loss 0.9864 | valid ppl 2.6816 | learning rate 20.0000
| end of split 4 / 62 | epoch 2 | time: 1623.07s | valid loss 0.9877 | valid ppl 2.6851 | learning rate 20.0000
| end of split 5 / 62 | epoch 2 | time: 1620.60s | valid loss 1.0115 | valid ppl 2.7496 | learning rate 20.0000
| end of split 6 / 62 | epoch 2 | time: 1622.51s | valid loss 0.9890 | valid ppl 2.6887 | learning rate 20.0000
| end of split 7 / 62 | epoch 2 | time: 1620.37s | valid loss 0.9862 | valid ppl 2.6811 | learning rate 20.0000
| end of split 8 / 62 | epoch 2 | time: 1620.70s | valid loss 0.9869 | valid ppl 2.6828 | learning rate 20.0000
| end of split 9 / 62 | epoch 2 | time: 1619.16s | valid loss 0.9861 | valid ppl 2.6808 | learning rate 20.0000
| end of split 10 / 62 | epoch 2 | time: 1617.83s | valid loss 0.9867 | valid ppl 2.6822 | learning rate 20.0000
| end of split 11 / 62 | epoch 2 | time: 1618.28s | valid loss 1.0056 | valid ppl 2.7335 | learning rate 20.0000
| end of split 12 / 62 | epoch 2 | time: 1615.81s | valid loss 0.9829 | valid ppl 2.6723 | learning rate 20.0000
| end of split 13 / 62 | epoch 2 | time: 1615.59s | valid loss 0.9849 | valid ppl 2.6776 | learning rate 20.0000
| end of split 14 / 62 | epoch 2 | time: 1616.05s | valid loss 0.9907 | valid ppl 2.6930 | learning rate 20.0000
| end of split 15 / 62 | epoch 2 | time: 863.11s | valid loss 0.9904 | valid ppl 2.6922 | learning rate 20.0000
| end of split 16 / 62 | epoch 2 | time: 1614.44s | valid loss 0.9823 | valid ppl 2.6705 | learning rate 20.0000
| end of split 17 / 62 | epoch 2 | time: 1612.68s | valid loss 0.9824 | valid ppl 2.6708 | learning rate 20.0000
| end of split 18 / 62 | epoch 2 | time: 1608.56s | valid loss 0.9810 | valid ppl 2.6670 | learning rate 20.0000
| end of split 19 / 62 | epoch 2 | time: 1585.34s | valid loss 0.9799 | valid ppl 2.6641 | learning rate 20.0000
| end of split 20 / 62 | epoch 2 | time: 1582.65s | valid loss 0.9801 | valid ppl 2.6647 | learning rate 20.0000
| end of split 21 / 62 | epoch 2 | time: 1581.78s | valid loss 0.9804 | valid ppl 2.6656 | learning rate 20.0000
| end of split 22 / 62 | epoch 2 | time: 1583.27s | valid loss 0.9791 | valid ppl 2.6620 | learning rate 20.0000
| end of split 23 / 62 | epoch 2 | time: 1580.74s | valid loss 0.9780 | valid ppl 2.6590 | learning rate 20.0000
| end of split 24 / 62 | epoch 2 | time: 1581.13s | valid loss 0.9782 | valid ppl 2.6597 | learning rate 20.0000
| end of split 25 / 62 | epoch 2 | time: 1580.34s | valid loss 0.9795 | valid ppl 2.6631 | learning rate 20.0000
| end of split 26 / 62 | epoch 2 | time: 1580.35s | valid loss 0.9782 | valid ppl 2.6597 | learning rate 20.0000
| end of split 27 / 62 | epoch 2 | time: 1579.55s | valid loss 0.9780 | valid ppl 2.6592 | learning rate 20.0000
| end of split 28 / 62 | epoch 2 | time: 1583.05s | valid loss 0.9850 | valid ppl 2.6778 | learning rate 20.0000
| end of split 29 / 62 | epoch 2 | time: 1580.68s | valid loss 0.9822 | valid ppl 2.6702 | learning rate 20.0000
| end of split 30 / 62 | epoch 2 | time: 1577.58s | valid loss 0.9923 | valid ppl 2.6973 | learning rate 20.0000
| end of split 31 / 62 | epoch 2 | time: 1581.85s | valid loss 0.9764 | valid ppl 2.6550 | learning rate 20.0000
| end of split 32 / 62 | epoch 2 | time: 1585.87s | valid loss 0.9760 | valid ppl 2.6537 | learning rate 20.0000
| end of split 33 / 62 | epoch 2 | time: 1588.93s | valid loss 0.9758 | valid ppl 2.6533 | learning rate 20.0000
| end of split 34 / 62 | epoch 2 | time: 1590.44s | valid loss 0.9759 | valid ppl 2.6536 | learning rate 20.0000
| end of split 35 / 62 | epoch 2 | time: 1592.53s | valid loss 0.9758 | valid ppl 2.6532 | learning rate 20.0000
| end of split 36 / 62 | epoch 2 | time: 1594.13s | valid loss 0.9758 | valid ppl 2.6532 | learning rate 20.0000
| end of split 37 / 62 | epoch 2 | time: 1592.90s | valid loss 0.9737 | valid ppl 2.6476 | learning rate 20.0000
| end of split 38 / 62 | epoch 2 | time: 1594.82s | valid loss 0.9736 | valid ppl 2.6474 | learning rate 20.0000
| end of split 39 / 62 | epoch 2 | time: 1596.77s | valid loss 0.9754 | valid ppl 2.6521 | learning rate 20.0000
| end of split 40 / 62 | epoch 2 | time: 1599.71s | valid loss 0.9753 | valid ppl 2.6520 | learning rate 20.0000
| end of split 41 / 62 | epoch 2 | time: 1603.63s | valid loss 0.9745 | valid ppl 2.6498 | learning rate 20.0000
| end of split 42 / 62 | epoch 2 | time: 1608.89s | valid loss 0.9734 | valid ppl 2.6470 | learning rate 20.0000
| end of split 43 / 62 | epoch 2 | time: 1609.09s | valid loss 0.9725 | valid ppl 2.6445 | learning rate 20.0000
| end of split 44 / 62 | epoch 2 | time: 1602.88s | valid loss 0.9728 | valid ppl 2.6454 | learning rate 20.0000
| end of split 45 / 62 | epoch 2 | time: 1598.34s | valid loss 0.9721 | valid ppl 2.6434 | learning rate 20.0000
| end of split 46 / 62 | epoch 2 | time: 1600.19s | valid loss 0.9719 | valid ppl 2.6430 | learning rate 20.0000
| end of split 47 / 62 | epoch 2 | time: 1601.67s | valid loss 0.9719 | valid ppl 2.6431 | learning rate 20.0000
| end of split 48 / 62 | epoch 2 | time: 1605.64s | valid loss 0.9719 | valid ppl 2.6428 | learning rate 20.0000
| end of split 49 / 62 | epoch 2 | time: 1604.96s | valid loss 0.9710 | valid ppl 2.6406 | learning rate 20.0000
| end of split 50 / 62 | epoch 2 | time: 1603.96s | valid loss 0.9715 | valid ppl 2.6420 | learning rate 20.0000
| end of split 51 / 62 | epoch 2 | time: 1609.00s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
| end of split 52 / 62 | epoch 2 | time: 1609.47s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
| end of split 53 / 62 | epoch 2 | time: 1607.14s | valid loss 0.9725 | valid ppl 2.6447 | learning rate 20.0000
| end of split 54 / 62 | epoch 2 | time: 1606.27s | valid loss 0.9706 | valid ppl 2.6396 | learning rate 20.0000
| end of split 55 / 62 | epoch 2 | time: 1607.85s | valid loss 0.9706 | valid ppl 2.6395 | learning rate 20.0000
| end of split 56 / 62 | epoch 2 | time: 1607.99s | valid loss 0.9727 | valid ppl 2.6451 | learning rate 20.0000
| end of split 57 / 62 | epoch 2 | time: 1609.15s | valid loss 0.9696 | valid ppl 2.6368 | learning rate 20.0000
| end of split 58 / 62 | epoch 2 | time: 1606.21s | valid loss 0.9691 | valid ppl 2.6355 | learning rate 20.0000
| end of split 59 / 62 | epoch 2 | time: 1606.97s | valid loss 0.9684 | valid ppl 2.6337 | learning rate 20.0000
| end of split 60 / 62 | epoch 2 | time: 1605.30s | valid loss 0.9686 | valid ppl 2.6341 | learning rate 20.0000
| end of split 61 / 62 | epoch 2 | time: 1606.09s | valid loss 0.9678 | valid ppl 2.6322 | learning rate 20.0000
| end of split 62 / 62 | epoch 2 | time: 1604.24s | valid loss 0.9692 | valid ppl 2.6359 | learning rate 20.0000
| end of split 1 / 62 | epoch 3 | time: 1595.63s | valid loss 0.9704 | valid ppl 2.6389 | learning rate 20.0000
| end of split 2 / 62 | epoch 3 | time: 1599.02s | valid loss 0.9697 | valid ppl 2.6373 | learning rate 20.0000
| end of split 3 / 62 | epoch 3 | time: 1599.83s | valid loss 0.9676 | valid ppl 2.6315 | learning rate 20.0000
| end of split 4 / 62 | epoch 3 | time: 1601.68s | valid loss 0.9684 | valid ppl 2.6337 | learning rate 20.0000
| end of split 5 / 62 | epoch 3 | time: 1600.81s | valid loss 0.9697 | valid ppl 2.6372 | learning rate 20.0000
| end of split 6 / 62 | epoch 3 | time: 1601.85s | valid loss 0.9692 | valid ppl 2.6359 | learning rate 20.0000
| end of split 7 / 62 | epoch 3 | time: 1599.16s | valid loss 0.9675 | valid ppl 2.6314 | learning rate 20.0000
| end of split 8 / 62 | epoch 3 | time: 1599.83s | valid loss 0.9686 | valid ppl 2.6342 | learning rate 20.0000
| end of split 9 / 62 | epoch 3 | time: 1587.43s | valid loss 0.9669 | valid ppl 2.6298 | learning rate 20.0000
| end of split 10 / 62 | epoch 3 | time: 1588.81s | valid loss 0.9677 | valid ppl 2.6318 | learning rate 20.0000
| end of split 11 / 62 | epoch 3 | time: 1590.43s | valid loss 0.9673 | valid ppl 2.6307 | learning rate 20.0000
| end of split 12 / 62 | epoch 3 | time: 1592.90s | valid loss 0.9668 | valid ppl 2.6296 | learning rate 20.0000
| end of split 13 / 62 | epoch 3 | time: 1594.36s | valid loss 0.9676 | valid ppl 2.6317 | learning rate 20.0000
| end of split 14 / 62 | epoch 3 | time: 1595.81s | valid loss 0.9652 | valid ppl 2.6254 | learning rate 20.0000
| end of split 15 / 62 | epoch 3 | time: 1596.70s | valid loss 0.9659 | valid ppl 2.6271 | learning rate 20.0000
| end of split 16 / 62 | epoch 3 | time: 1591.94s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
| end of split 17 / 62 | epoch 3 | time: 1584.49s | valid loss 0.9656 | valid ppl 2.6262 | learning rate 20.0000
| end of split 18 / 62 | epoch 3 | time: 1585.57s | valid loss 0.9649 | valid ppl 2.6245 | learning rate 20.0000
| end of split 19 / 62 | epoch 3 | time: 1579.95s | valid loss 0.9650 | valid ppl 2.6248 | learning rate 20.0000
| end of split 20 / 62 | epoch 3 | time: 843.60s | valid loss 0.9738 | valid ppl 2.6480 | learning rate 20.0000
| end of split 21 / 62 | epoch 3 | time: 1580.19s | valid loss 0.9780 | valid ppl 2.6592 | learning rate 20.0000
| end of split 22 / 62 | epoch 3 | time: 1582.17s | valid loss 1.0091 | valid ppl 2.7433 | learning rate 20.0000
| end of split 23 / 62 | epoch 3 | time: 1582.31s | valid loss 0.9639 | valid ppl 2.6220 | learning rate 20.0000
| end of split 24 / 62 | epoch 3 | time: 1582.57s | valid loss 0.9828 | valid ppl 2.6720 | learning rate 20.0000
| end of split 25 / 62 | epoch 3 | time: 1582.46s | valid loss 0.9636 | valid ppl 2.6210 | learning rate 20.0000
| end of split 26 / 62 | epoch 3 | time: 1585.02s | valid loss 0.9653 | valid ppl 2.6255 | learning rate 20.0000
| end of split 27 / 62 | epoch 3 | time: 1584.48s | valid loss 0.9638 | valid ppl 2.6216 | learning rate 20.0000
| end of split 28 / 62 | epoch 3 | time: 1585.97s | valid loss 0.9641 | valid ppl 2.6225 | learning rate 20.0000
| end of split 29 / 62 | epoch 3 | time: 1588.62s | valid loss 0.9630 | valid ppl 2.6195 | learning rate 20.0000
| end of split 30 / 62 | epoch 3 | time: 1605.99s | valid loss 0.9626 | valid ppl 2.6184 | learning rate 20.0000
| end of split 31 / 62 | epoch 3 | time: 1627.59s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
| end of split 32 / 62 | epoch 3 | time: 1600.91s | valid loss 0.9649 | valid ppl 2.6246 | learning rate 20.0000
| end of split 33 / 62 | epoch 3 | time: 1607.37s | valid loss 0.9635 | valid ppl 2.6208 | learning rate 20.0000
| end of split 34 / 62 | epoch 3 | time: 1605.43s | valid loss 0.9619 | valid ppl 2.6166 | learning rate 20.0000
| end of split 35 / 62 | epoch 3 | time: 1606.13s | valid loss 0.9621 | valid ppl 2.6173 | learning rate 20.0000
| end of split 36 / 62 | epoch 3 | time: 1604.60s | valid loss 0.9622 | valid ppl 2.6175 | learning rate 20.0000
| end of split 37 / 62 | epoch 3 | time: 1606.96s | valid loss 0.9620 | valid ppl 2.6170 | learning rate 20.0000
| end of split 38 / 62 | epoch 3 | time: 1604.31s | valid loss 0.9615 | valid ppl 2.6157 | learning rate 20.0000
| end of split 39 / 62 | epoch 3 | time: 1603.46s | valid loss 0.9618 | valid ppl 2.6165 | learning rate 20.0000
| end of split 40 / 62 | epoch 3 | time: 1602.53s | valid loss 0.9613 | valid ppl 2.6151 | learning rate 20.0000
| end of split 41 / 62 | epoch 3 | time: 1602.02s | valid loss 0.9613 | valid ppl 2.6151 | learning rate 20.0000
| end of split 42 / 62 | epoch 3 | time: 1601.02s | valid loss 0.9618 | valid ppl 2.6165 | learning rate 20.0000
| end of split 43 / 62 | epoch 3 | time: 1602.13s | valid loss 0.9694 | valid ppl 2.6364 | learning rate 20.0000
| end of split 44 / 62 | epoch 3 | time: 1605.32s | valid loss 0.9612 | valid ppl 2.6149 | learning rate 20.0000
| end of split 45 / 62 | epoch 3 | time: 1607.04s | valid loss 0.9808 | valid ppl 2.6667 | learning rate 20.0000
| end of split 46 / 62 | epoch 3 | time: 1600.96s | valid loss 0.9597 | valid ppl 2.6108 | learning rate 20.0000
| end of split 47 / 62 | epoch 3 | time: 1602.97s | valid loss 0.9597 | valid ppl 2.6109 | learning rate 20.0000
| end of split 48 / 62 | epoch 3 | time: 1600.73s | valid loss 0.9657 | valid ppl 2.6267 | learning rate 20.0000
| end of split 49 / 62 | epoch 3 | time: 1601.65s | valid loss 0.9614 | valid ppl 2.6154 | learning rate 20.0000
| end of split 50 / 62 | epoch 3 | time: 1601.78s | valid loss 0.9603 | valid ppl 2.6124 | learning rate 20.0000
| end of split 51 / 62 | epoch 3 | time: 1601.02s | valid loss 0.9593 | valid ppl 2.6098 | learning rate 20.0000
| end of split 52 / 62 | epoch 3 | time: 1600.92s | valid loss 0.9607 | valid ppl 2.6136 | learning rate 20.0000
| end of split 53 / 62 | epoch 3 | time: 1601.95s | valid loss 0.9604 | valid ppl 2.6127 | learning rate 20.0000
| end of split 54 / 62 | epoch 3 | time: 1600.51s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
| end of split 55 / 62 | epoch 3 | time: 1599.14s | valid loss 0.9588 | valid ppl 2.6086 | learning rate 20.0000
| end of split 56 / 62 | epoch 3 | time: 1599.72s | valid loss 0.9602 | valid ppl 2.6123 | learning rate 20.0000
| end of split 57 / 62 | epoch 3 | time: 1597.65s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
| end of split 58 / 62 | epoch 3 | time: 1598.97s | valid loss 0.9593 | valid ppl 2.6098 | learning rate 20.0000
| end of split 59 / 62 | epoch 3 | time: 1601.81s | valid loss 0.9589 | valid ppl 2.6089 | learning rate 20.0000
| end of split 60 / 62 | epoch 3 | time: 1600.21s | valid loss 0.9599 | valid ppl 2.6115 | learning rate 20.0000
| end of split 61 / 62 | epoch 3 | time: 1598.25s | valid loss 0.9595 | valid ppl 2.6103 | learning rate 20.0000
| end of split 62 / 62 | epoch 3 | time: 1600.01s | valid loss 0.9584 | valid ppl 2.6075 | learning rate 20.0000
| end of split 1 / 62 | epoch 4 | time: 1595.62s | valid loss 0.9586 | valid ppl 2.6081 | learning rate 20.0000
| end of split 2 / 62 | epoch 4 | time: 1593.94s | valid loss 0.9598 | valid ppl 2.6110 | learning rate 20.0000
| end of split 3 / 62 | epoch 4 | time: 1595.86s | valid loss 0.9592 | valid ppl 2.6096 | learning rate 20.0000
| end of split 4 / 62 | epoch 4 | time: 852.38s | valid loss 0.9646 | valid ppl 2.6237 | learning rate 20.0000
| end of split 5 / 62 | epoch 4 | time: 1596.43s | valid loss 0.9590 | valid ppl 2.6091 | learning rate 20.0000
| end of split 6 / 62 | epoch 4 | time: 1594.53s | valid loss 0.9584 | valid ppl 2.6075 | learning rate 20.0000
| end of split 7 / 62 | epoch 4 | time: 1594.97s | valid loss 0.9573 | valid ppl 2.6046 | learning rate 20.0000
| end of split 8 / 62 | epoch 4 | time: 1594.23s | valid loss 0.9579 | valid ppl 2.6062 | learning rate 20.0000
| end of split 9 / 62 | epoch 4 | time: 1594.24s | valid loss 0.9580 | valid ppl 2.6065 | learning rate 20.0000
| end of split 10 / 62 | epoch 4 | time: 1591.80s | valid loss 0.9578 | valid ppl 2.6059 | learning rate 20.0000
| end of split 11 / 62 | epoch 4 | time: 1580.82s | valid loss 0.9572 | valid ppl 2.6043 | learning rate 20.0000
| end of split 12 / 62 | epoch 4 | time: 1578.60s | valid loss 0.9580 | valid ppl 2.6064 | learning rate 20.0000
| end of split 13 / 62 | epoch 4 | time: 1580.22s | valid loss 0.9585 | valid ppl 2.6079 | learning rate 20.0000
| end of split 14 / 62 | epoch 4 | time: 1578.77s | valid loss 0.9627 | valid ppl 2.6189 | learning rate 20.0000
| end of split 15 / 62 | epoch 4 | time: 1579.05s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
| end of split 16 / 62 | epoch 4 | time: 1577.56s | valid loss 0.9568 | valid ppl 2.6034 | learning rate 20.0000
| end of split 17 / 62 | epoch 4 | time: 1578.26s | valid loss 0.9572 | valid ppl 2.6044 | learning rate 20.0000
| end of split 18 / 62 | epoch 4 | time: 1579.21s | valid loss 0.9566 | valid ppl 2.6027 | learning rate 20.0000
| end of split 19 / 62 | epoch 4 | time: 1578.77s | valid loss 0.9567 | valid ppl 2.6030 | learning rate 20.0000
| end of split 20 / 62 | epoch 4 | time: 1576.14s | valid loss 0.9584 | valid ppl 2.6076 | learning rate 20.0000
| end of split 21 / 62 | epoch 4 | time: 1576.68s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 20.0000
| end of split 22 / 62 | epoch 4 | time: 1576.80s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
| end of split 23 / 62 | epoch 4 | time: 1576.23s | valid loss 0.9744 | valid ppl 2.6496 | learning rate 20.0000
| end of split 24 / 62 | epoch 4 | time: 1575.49s | valid loss 0.9566 | valid ppl 2.6028 | learning rate 20.0000
| end of split 25 / 62 | epoch 4 | time: 1577.44s | valid loss 0.9555 | valid ppl 2.6000 | learning rate 20.0000
| end of split 26 / 62 | epoch 4 | time: 1577.10s | valid loss 0.9564 | valid ppl 2.6024 | learning rate 20.0000
| end of split 27 / 62 | epoch 4 | time: 1576.83s | valid loss 0.9560 | valid ppl 2.6012 | learning rate 20.0000
| end of split 28 / 62 | epoch 4 | time: 1588.94s | valid loss 0.9567 | valid ppl 2.6031 | learning rate 20.0000
| end of split 29 / 62 | epoch 4 | time: 1591.83s | valid loss 0.9554 | valid ppl 2.5996 | learning rate 20.0000
| end of split 30 / 62 | epoch 4 | time: 1603.93s | valid loss 0.9554 | valid ppl 2.5997 | learning rate 20.0000
| end of split 31 / 62 | epoch 4 | time: 1595.76s | valid loss 0.9549 | valid ppl 2.5985 | learning rate 20.0000
| end of split 32 / 62 | epoch 4 | time: 1711.81s | valid loss 0.9561 | valid ppl 2.6015 | learning rate 20.0000
| end of split 33 / 62 | epoch 4 | time: 1577.07s | valid loss 0.9577 | valid ppl 2.6058 | learning rate 20.0000
| end of split 34 / 62 | epoch 4 | time: 1576.41s | valid loss 0.9546 | valid ppl 2.5978 | learning rate 20.0000
| end of split 35 / 62 | epoch 4 | time: 1577.72s | valid loss 0.9552 | valid ppl 2.5991 | learning rate 20.0000
| end of split 36 / 62 | epoch 4 | time: 1577.03s | valid loss 0.9553 | valid ppl 2.5995 | learning rate 20.0000
| end of split 37 / 62 | epoch 4 | time: 1578.71s | valid loss 0.9544 | valid ppl 2.5972 | learning rate 20.0000
| end of split 38 / 62 | epoch 4 | time: 1577.03s | valid loss 0.9559 | valid ppl 2.6011 | learning rate 20.0000
| end of split 39 / 62 | epoch 4 | time: 1630.11s | valid loss 0.9540 | valid ppl 2.5962 | learning rate 20.0000
| end of split 40 / 62 | epoch 4 | time: 1579.09s | valid loss 0.9558 | valid ppl 2.6007 | learning rate 20.0000
| end of split 41 / 62 | epoch 4 | time: 1578.58s | valid loss 0.9538 | valid ppl 2.5956 | learning rate 20.0000
| end of split 42 / 62 | epoch 4 | time: 1579.44s | valid loss 0.9541 | valid ppl 2.5964 | learning rate 20.0000
| end of split 43 / 62 | epoch 4 | time: 1577.04s | valid loss 0.9544 | valid ppl 2.5971 | learning rate 20.0000
| end of split 44 / 62 | epoch 4 | time: 1576.88s | valid loss 0.9544 | valid ppl 2.5972 | learning rate 20.0000
| end of split 45 / 62 | epoch 4 | time: 1578.62s | valid loss 0.9600 | valid ppl 2.6116 | learning rate 20.0000
| end of split 46 / 62 | epoch 4 | time: 1577.25s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
| end of split 47 / 62 | epoch 4 | time: 1577.78s | valid loss 0.9554 | valid ppl 2.5996 | learning rate 20.0000
| end of split 48 / 62 | epoch 4 | time: 1577.99s | valid loss 0.9545 | valid ppl 2.5974 | learning rate 20.0000
| end of split 49 / 62 | epoch 4 | time: 1575.73s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 20.0000
| end of split 50 / 62 | epoch 4 | time: 1574.23s | valid loss 0.9535 | valid ppl 2.5947 | learning rate 20.0000
| end of split 51 / 62 | epoch 4 | time: 1575.99s | valid loss 0.9623 | valid ppl 2.6176 | learning rate 20.0000
| end of split 52 / 62 | epoch 4 | time: 1575.37s | valid loss 0.9954 | valid ppl 2.7058 | learning rate 20.0000
| end of split 53 / 62 | epoch 4 | time: 1574.08s | valid loss 0.9561 | valid ppl 2.6014 | learning rate 20.0000
| end of split 54 / 62 | epoch 4 | time: 1575.32s | valid loss 0.9543 | valid ppl 2.5968 | learning rate 20.0000
| end of split 55 / 62 | epoch 4 | time: 1575.06s | valid loss 0.9541 | valid ppl 2.5962 | learning rate 20.0000
| end of split 56 / 62 | epoch 4 | time: 1575.80s | valid loss 0.9713 | valid ppl 2.6413 | learning rate 20.0000
| end of split 57 / 62 | epoch 4 | time: 1577.19s | valid loss 0.9556 | valid ppl 2.6003 | learning rate 20.0000
| end of split 58 / 62 | epoch 4 | time: 1576.21s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
| end of split 59 / 62 | epoch 4 | time: 1577.08s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 20.0000
| end of split 60 / 62 | epoch 4 | time: 1574.14s | valid loss 0.9572 | valid ppl 2.6045 | learning rate 20.0000
| end of split 61 / 62 | epoch 4 | time: 1571.90s | valid loss 0.9549 | valid ppl 2.5984 | learning rate 20.0000
| end of split 62 / 62 | epoch 4 | time: 1572.26s | valid loss 0.9482 | valid ppl 2.5811 | learning rate 5.0000
| end of split 1 / 62 | epoch 5 | time: 1570.96s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
| end of split 2 / 62 | epoch 5 | time: 1573.43s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
| end of split 3 / 62 | epoch 5 | time: 1573.08s | valid loss 0.9472 | valid ppl 2.5786 | learning rate 5.0000
| end of split 4 / 62 | epoch 5 | time: 1572.80s | valid loss 0.9474 | valid ppl 2.5789 | learning rate 5.0000
| end of split 5 / 62 | epoch 5 | time: 1572.50s | valid loss 0.9477 | valid ppl 2.5798 | learning rate 5.0000
| end of split 6 / 62 | epoch 5 | time: 1574.27s | valid loss 0.9469 | valid ppl 2.5777 | learning rate 5.0000
| end of split 7 / 62 | epoch 5 | time: 1575.64s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
| end of split 8 / 62 | epoch 5 | time: 1577.81s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
| end of split 9 / 62 | epoch 5 | time: 1578.61s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
| end of split 10 / 62 | epoch 5 | time: 1580.32s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
| end of split 11 / 62 | epoch 5 | time: 1581.85s | valid loss 0.9467 | valid ppl 2.5771 | learning rate 5.0000
| end of split 12 / 62 | epoch 5 | time: 1582.22s | valid loss 0.9466 | valid ppl 2.5769 | learning rate 5.0000
| end of split 13 / 62 | epoch 5 | time: 1581.45s | valid loss 0.9466 | valid ppl 2.5769 | learning rate 5.0000
| end of split 14 / 62 | epoch 5 | time: 1579.73s | valid loss 0.9466 | valid ppl 2.5770 | learning rate 5.0000
| end of split 15 / 62 | epoch 5 | time: 1581.60s | valid loss 0.9466 | valid ppl 2.5768 | learning rate 5.0000
| end of split 16 / 62 | epoch 5 | time: 1577.02s | valid loss 0.9463 | valid ppl 2.5761 | learning rate 5.0000
| end of split 17 / 62 | epoch 5 | time: 1576.46s | valid loss 0.9465 | valid ppl 2.5768 | learning rate 5.0000
| end of split 18 / 62 | epoch 5 | time: 1577.82s | valid loss 0.9472 | valid ppl 2.5785 | learning rate 5.0000
| end of split 19 / 62 | epoch 5 | time: 1579.10s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
| end of split 20 / 62 | epoch 5 | time: 1579.00s | valid loss 0.9462 | valid ppl 2.5760 | learning rate 5.0000
| end of split 21 / 62 | epoch 5 | time: 1579.61s | valid loss 0.9461 | valid ppl 2.5757 | learning rate 5.0000
| end of split 22 / 62 | epoch 5 | time: 1580.98s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
| end of split 23 / 62 | epoch 5 | time: 1581.08s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
| end of split 24 / 62 | epoch 5 | time: 1581.18s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
| end of split 25 / 62 | epoch 5 | time: 1579.63s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
| end of split 26 / 62 | epoch 5 | time: 1584.07s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
| end of split 27 / 62 | epoch 5 | time: 1595.88s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
| end of split 28 / 62 | epoch 5 | time: 1594.85s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
| end of split 29 / 62 | epoch 5 | time: 1592.49s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
| end of split 30 / 62 | epoch 5 | time: 1592.88s | valid loss 0.9459 | valid ppl 2.5750 | learning rate 5.0000
| end of split 31 / 62 | epoch 5 | time: 1595.11s | valid loss 0.9458 | valid ppl 2.5747 | learning rate 5.0000
| end of split 32 / 62 | epoch 5 | time: 1596.27s | valid loss 0.9458 | valid ppl 2.5748 | learning rate 5.0000
| end of split 33 / 62 | epoch 5 | time: 1593.21s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
| end of split 34 / 62 | epoch 5 | time: 1594.40s | valid loss 0.9457 | valid ppl 2.5746 | learning rate 5.0000
| end of split 35 / 62 | epoch 5 | time: 1590.87s | valid loss 0.9455 | valid ppl 2.5741 | learning rate 5.0000
| end of split 36 / 62 | epoch 5 | time: 1593.79s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
| end of split 37 / 62 | epoch 5 | time: 1591.50s | valid loss 0.9456 | valid ppl 2.5745 | learning rate 5.0000
| end of split 38 / 62 | epoch 5 | time: 1589.49s | valid loss 0.9457 | valid ppl 2.5745 | learning rate 5.0000
| end of split 39 / 62 | epoch 5 | time: 1590.75s | valid loss 0.9480 | valid ppl 2.5806 | learning rate 5.0000
| end of split 40 / 62 | epoch 5 | time: 1590.43s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
| end of split 41 / 62 | epoch 5 | time: 1590.08s | valid loss 0.9455 | valid ppl 2.5741 | learning rate 5.0000
| end of split 42 / 62 | epoch 5 | time: 1589.48s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
| end of split 43 / 62 | epoch 5 | time: 1587.62s | valid loss 0.9457 | valid ppl 2.5745 | learning rate 5.0000
| end of split 44 / 62 | epoch 5 | time: 1586.79s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
| end of split 45 / 62 | epoch 5 | time: 1585.86s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
| end of split 46 / 62 | epoch 5 | time: 1586.95s | valid loss 0.9454 | valid ppl 2.5738 | learning rate 5.0000
| end of split 47 / 62 | epoch 5 | time: 1587.96s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
| end of split 48 / 62 | epoch 5 | time: 1587.28s | valid loss 0.9455 | valid ppl 2.5740 | learning rate 5.0000
| end of split 49 / 62 | epoch 5 | time: 1587.77s | valid loss 0.9451 | valid ppl 2.5732 | learning rate 5.0000
| end of split 50 / 62 | epoch 5 | time: 1586.98s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
| end of split 51 / 62 | epoch 5 | time: 1585.51s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
| end of split 52 / 62 | epoch 5 | time: 1586.57s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
| end of split 53 / 62 | epoch 5 | time: 1586.75s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
| end of split 54 / 62 | epoch 5 | time: 846.84s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
| end of split 55 / 62 | epoch 5 | time: 1583.94s | valid loss 0.9451 | valid ppl 2.5730 | learning rate 5.0000
| end of split 56 / 62 | epoch 5 | time: 1585.75s | valid loss 0.9451 | valid ppl 2.5732 | learning rate 5.0000
| end of split 57 / 62 | epoch 5 | time: 1585.81s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
| end of split 58 / 62 | epoch 5 | time: 1586.18s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
| end of split 59 / 62 | epoch 5 | time: 1586.85s | valid loss 0.9449 | valid ppl 2.5725 | learning rate 5.0000
| end of split 60 / 62 | epoch 5 | time: 1591.84s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
| end of split 61 / 62 | epoch 5 | time: 1592.74s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
| end of split 62 / 62 | epoch 5 | time: 1595.38s | valid loss 0.9449 | valid ppl 2.5726 | learning rate 5.0000
| end of split 1 / 62 | epoch 6 | time: 1594.09s | valid loss 0.9448 | valid ppl 2.5724 | learning rate 5.0000
| end of split 2 / 62 | epoch 6 | time: 1598.24s | valid loss 0.9448 | valid ppl 2.5723 | learning rate 5.0000
| end of split 3 / 62 | epoch 6 | time: 1598.85s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
| end of split 4 / 62 | epoch 6 | time: 1593.37s | valid loss 0.9448 | valid ppl 2.5723 | learning rate 5.0000
| end of split 5 / 62 | epoch 6 | time: 1586.31s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
| end of split 6 / 62 | epoch 6 | time: 1586.36s | valid loss 0.9446 | valid ppl 2.5718 | learning rate 5.0000
| end of split 7 / 62 | epoch 6 | time: 1584.08s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
| end of split 8 / 62 | epoch 6 | time: 1584.49s | valid loss 0.9445 | valid ppl 2.5716 | learning rate 5.0000
| end of split 9 / 62 | epoch 6 | time: 1583.63s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
| end of split 10 / 62 | epoch 6 | time: 1582.25s | valid loss 0.9446 | valid ppl 2.5718 | learning rate 5.0000
| end of split 11 / 62 | epoch 6 | time: 1583.67s | valid loss 0.9447 | valid ppl 2.5721 | learning rate 5.0000
| end of split 12 / 62 | epoch 6 | time: 1592.91s | valid loss 0.9445 | valid ppl 2.5715 | learning rate 5.0000
| end of split 13 / 62 | epoch 6 | time: 1591.67s | valid loss 0.9445 | valid ppl 2.5716 | learning rate 5.0000
| end of split 14 / 62 | epoch 6 | time: 1593.32s | valid loss 0.9444 | valid ppl 2.5712 | learning rate 5.0000
| end of split 15 / 62 | epoch 6 | time: 1595.18s | valid loss 0.9444 | valid ppl 2.5714 | learning rate 5.0000
| end of split 16 / 62 | epoch 6 | time: 1595.10s | valid loss 0.9447 | valid ppl 2.5719 | learning rate 5.0000
| end of split 17 / 62 | epoch 6 | time: 1595.70s | valid loss 0.9444 | valid ppl 2.5711 | learning rate 5.0000
| end of split 18 / 62 | epoch 6 | time: 1593.68s | valid loss 0.9444 | valid ppl 2.5713 | learning rate 5.0000
| end of split 19 / 62 | epoch 6 | time: 1595.28s | valid loss 0.9448 | valid ppl 2.5722 | learning rate 5.0000
| end of split 20 / 62 | epoch 6 | time: 1595.01s | valid loss 0.9475 | valid ppl 2.5793 | learning rate 5.0000
| end of split 21 / 62 | epoch 6 | time: 1594.95s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
| end of split 22 / 62 | epoch 6 | time: 1595.46s | valid loss 0.9453 | valid ppl 2.5736 | learning rate 5.0000
| end of split 23 / 62 | epoch 6 | time: 1597.41s | valid loss 0.9442 | valid ppl 2.5708 | learning rate 5.0000
| end of split 24 / 62 | epoch 6 | time: 1597.13s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
| end of split 25 / 62 | epoch 6 | time: 1595.18s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
| end of split 26 / 62 | epoch 6 | time: 1594.01s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
| end of split 27 / 62 | epoch 6 | time: 1594.84s | valid loss 0.9443 | valid ppl 2.5710 | learning rate 5.0000
| end of split 28 / 62 | epoch 6 | time: 1592.94s | valid loss 0.9441 | valid ppl 2.5705 | learning rate 5.0000
| end of split 29 / 62 | epoch 6 | time: 1591.38s | valid loss 0.9443 | valid ppl 2.5711 | learning rate 5.0000
| end of split 30 / 62 | epoch 6 | time: 1590.34s | valid loss 0.9442 | valid ppl 2.5707 | learning rate 5.0000
| end of split 31 / 62 | epoch 6 | time: 1592.84s | valid loss 0.9441 | valid ppl 2.5704 | learning rate 5.0000
| end of split 32 / 62 | epoch 6 | time: 1589.97s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
| end of split 33 / 62 | epoch 6 | time: 1589.48s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
| end of split 34 / 62 | epoch 6 | time: 1590.99s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 5.0000
| end of split 35 / 62 | epoch 6 | time: 1587.27s | valid loss 0.9441 | valid ppl 2.5706 | learning rate 5.0000
| end of split 36 / 62 | epoch 6 | time: 1589.43s | valid loss 0.9433 | valid ppl 2.5683 | learning rate 1.2500
| end of split 37 / 62 | epoch 6 | time: 1590.89s | valid loss 0.9431 | valid ppl 2.5680 | learning rate 1.2500
| end of split 38 / 62 | epoch 6 | time: 1591.30s | valid loss 0.9431 | valid ppl 2.5679 | learning rate 1.2500
| end of split 39 / 62 | epoch 6 | time: 1587.59s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
| end of split 40 / 62 | epoch 6 | time: 1589.99s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
| end of split 41 / 62 | epoch 6 | time: 848.87s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
| end of split 42 / 62 | epoch 6 | time: 1589.92s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
| end of split 43 / 62 | epoch 6 | time: 1588.08s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
| end of split 44 / 62 | epoch 6 | time: 1586.96s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
| end of split 45 / 62 | epoch 6 | time: 1587.55s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
| end of split 46 / 62 | epoch 6 | time: 1586.69s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
| end of split 47 / 62 | epoch 6 | time: 1587.20s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
| end of split 48 / 62 | epoch 6 | time: 1587.64s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
| end of split 49 / 62 | epoch 6 | time: 1579.53s | valid loss 0.9427 | valid ppl 2.5670 | learning rate 1.2500
| end of split 50 / 62 | epoch 6 | time: 1577.89s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
| end of split 51 / 62 | epoch 6 | time: 1574.78s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
| end of split 52 / 62 | epoch 6 | time: 1575.34s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
| end of split 53 / 62 | epoch 6 | time: 1574.50s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
| end of split 54 / 62 | epoch 6 | time: 1578.06s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
| end of split 55 / 62 | epoch 6 | time: 1577.22s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
| end of split 56 / 62 | epoch 6 | time: 1577.40s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
| end of split 57 / 62 | epoch 6 | time: 1579.42s | valid loss 0.9426 | valid ppl 2.5668 | learning rate 1.2500
| end of split 58 / 62 | epoch 6 | time: 1575.45s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 59 / 62 | epoch 6 | time: 1577.22s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
| end of split 60 / 62 | epoch 6 | time: 1582.29s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
| end of split 61 / 62 | epoch 6 | time: 1588.61s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
| end of split 62 / 62 | epoch 6 | time: 1588.70s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
| end of split 1 / 62 | epoch 7 | time: 1584.79s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
| end of split 2 / 62 | epoch 7 | time: 1588.80s | valid loss 0.9426 | valid ppl 2.5665 | learning rate 1.2500
| end of split 3 / 62 | epoch 7 | time: 1589.28s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 0.3125
| end of split 4 / 62 | epoch 7 | time: 1589.32s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 0.3125
| end of split 5 / 62 | epoch 7 | time: 1591.86s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 0.3125
| end of split 6 / 62 | epoch 7 | time: 1590.36s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 0.3125
| end of split 7 / 62 | epoch 7 | time: 1590.53s | valid loss 0.9423 | valid ppl 2.5659 | learning rate 0.3125
| end of split 8 / 62 | epoch 7 | time: 1589.81s | valid loss 0.9423 | valid ppl 2.5659 | learning rate 0.3125
| end of split 9 / 62 | epoch 7 | time: 1590.82s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 10 / 62 | epoch 7 | time: 1591.41s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 11 / 62 | epoch 7 | time: 1592.90s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 12 / 62 | epoch 7 | time: 1594.52s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 13 / 62 | epoch 7 | time: 1592.98s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 14 / 62 | epoch 7 | time: 1591.85s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 15 / 62 | epoch 7 | time: 1593.69s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 16 / 62 | epoch 7 | time: 850.92s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 17 / 62 | epoch 7 | time: 1591.86s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 18 / 62 | epoch 7 | time: 1591.87s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 19 / 62 | epoch 7 | time: 1590.77s | valid loss 0.9423 | valid ppl 2.5658 | learning rate 0.3125
| end of split 20 / 62 | epoch 7 | time: 1592.50s | valid loss 0.9422 | valid ppl 2.5657 | learning rate 0.3125
| end of split 21 / 62 | epoch 7 | time: 1590.69s | valid loss 0.9422 | valid ppl 2.5657 | learning rate 0.0781
| end of split 22 / 62 | epoch 7 | time: 1588.52s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 23 / 62 | epoch 7 | time: 1591.35s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 24 / 62 | epoch 7 | time: 1592.13s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 25 / 62 | epoch 7 | time: 1590.33s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 26 / 62 | epoch 7 | time: 1593.30s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 27 / 62 | epoch 7 | time: 1591.57s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 28 / 62 | epoch 7 | time: 1590.85s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 29 / 62 | epoch 7 | time: 1591.07s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 30 / 62 | epoch 7 | time: 1589.17s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 31 / 62 | epoch 7 | time: 1590.29s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0781
| end of split 32 / 62 | epoch 7 | time: 1588.94s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0195
| end of split 33 / 62 | epoch 7 | time: 1589.33s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.0195
| end of split 34 / 62 | epoch 7 | time: 1588.78s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 35 / 62 | epoch 7 | time: 1589.30s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 36 / 62 | epoch 7 | time: 1587.55s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 37 / 62 | epoch 7 | time: 1586.43s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 38 / 62 | epoch 7 | time: 1586.62s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 39 / 62 | epoch 7 | time: 1586.33s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 40 / 62 | epoch 7 | time: 1586.73s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 41 / 62 | epoch 7 | time: 1584.33s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 42 / 62 | epoch 7 | time: 1585.00s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 43 / 62 | epoch 7 | time: 1588.09s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 44 / 62 | epoch 7 | time: 1590.56s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.0195
| end of split 45 / 62 | epoch 7 | time: 1590.53s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0195
| end of split 46 / 62 | epoch 7 | time: 1595.27s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 47 / 62 | epoch 7 | time: 1599.33s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 48 / 62 | epoch 7 | time: 1598.60s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 49 / 62 | epoch 7 | time: 1598.68s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 50 / 62 | epoch 7 | time: 1600.25s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 51 / 62 | epoch 7 | time: 1597.95s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 52 / 62 | epoch 7 | time: 1598.75s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 53 / 62 | epoch 7 | time: 1599.63s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 54 / 62 | epoch 7 | time: 1594.92s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 55 / 62 | epoch 7 | time: 1595.71s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 56 / 62 | epoch 7 | time: 1597.02s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0049
| end of split 57 / 62 | epoch 7 | time: 1594.59s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 58 / 62 | epoch 7 | time: 1593.96s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 59 / 62 | epoch 7 | time: 1594.96s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 60 / 62 | epoch 7 | time: 1594.10s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 61 / 62 | epoch 7 | time: 1595.45s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 62 / 62 | epoch 7 | time: 1597.03s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 1 / 62 | epoch 8 | time: 1593.02s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 2 / 62 | epoch 8 | time: 1598.16s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 3 / 62 | epoch 8 | time: 1598.24s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 4 / 62 | epoch 8 | time: 1600.33s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 5 / 62 | epoch 8 | time: 1598.80s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0012
| end of split 6 / 62 | epoch 8 | time: 1599.19s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 7 / 62 | epoch 8 | time: 1599.86s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 8 / 62 | epoch 8 | time: 1597.82s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 9 / 62 | epoch 8 | time: 1597.89s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 10 / 62 | epoch 8 | time: 1596.89s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 11 / 62 | epoch 8 | time: 1596.65s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 12 / 62 | epoch 8 | time: 1593.04s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 13 / 62 | epoch 8 | time: 1584.13s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 14 / 62 | epoch 8 | time: 1581.93s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 15 / 62 | epoch 8 | time: 1579.07s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 16 / 62 | epoch 8 | time: 1580.06s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0003
| end of split 17 / 62 | epoch 8 | time: 1580.03s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
| end of split 18 / 62 | epoch 8 | time: 1580.61s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
| end of split 19 / 62 | epoch 8 | time: 1579.19s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
| end of split 20 / 62 | epoch 8 | time: 1579.59s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
| end of split 21 / 62 | epoch 8 | time: 1577.85s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.0001
TEST: valid loss 0.9407 | valid ppl 2.5618