flair-uk-forward / loss.txt
Dmitry Chaplinsky
Updated model: 683 splits, 24.39 epochs, min_loss: 1.0161, min_ppl: 2.7625
bc1b368
| end of split 1 / 28 | epoch 1 | time: 3240.36s | valid loss 1.6147 | valid ppl 5.0265 | learning rate 20.0000
| end of split 2 / 28 | epoch 1 | time: 3308.92s | valid loss 1.3939 | valid ppl 4.0306 | learning rate 20.0000
| end of split 3 / 28 | epoch 1 | time: 3314.96s | valid loss 1.3076 | valid ppl 3.6972 | learning rate 20.0000
| end of split 4 / 28 | epoch 1 | time: 3310.54s | valid loss 1.2635 | valid ppl 3.5377 | learning rate 20.0000
| end of split 5 / 28 | epoch 1 | time: 3314.64s | valid loss 1.2355 | valid ppl 3.4401 | learning rate 20.0000
| end of split 6 / 28 | epoch 1 | time: 3315.86s | valid loss 1.2150 | valid ppl 3.3702 | learning rate 20.0000
| end of split 7 / 28 | epoch 1 | time: 3310.63s | valid loss 1.1991 | valid ppl 3.3170 | learning rate 20.0000
| end of split 8 / 28 | epoch 1 | time: 3308.35s | valid loss 1.1851 | valid ppl 3.2712 | learning rate 20.0000
| end of split 9 / 28 | epoch 1 | time: 3300.72s | valid loss 1.1778 | valid ppl 3.2472 | learning rate 20.0000
| end of split 10 / 28 | epoch 1 | time: 3285.69s | valid loss 1.1703 | valid ppl 3.2231 | learning rate 20.0000
| end of split 11 / 28 | epoch 1 | time: 3296.28s | valid loss 1.1585 | valid ppl 3.1851 | learning rate 20.0000
| end of split 12 / 28 | epoch 1 | time: 3295.62s | valid loss 1.1557 | valid ppl 3.1762 | learning rate 20.0000
| end of split 13 / 28 | epoch 1 | time: 3299.01s | valid loss 1.1500 | valid ppl 3.1581 | learning rate 20.0000
| end of split 14 / 28 | epoch 1 | time: 3286.78s | valid loss 1.1402 | valid ppl 3.1274 | learning rate 20.0000
| end of split 15 / 28 | epoch 1 | time: 3297.94s | valid loss 1.1399 | valid ppl 3.1264 | learning rate 20.0000
| end of split 16 / 28 | epoch 1 | time: 3232.64s | valid loss 1.1346 | valid ppl 3.1099 | learning rate 20.0000
| end of split 17 / 28 | epoch 1 | time: 3083.81s | valid loss 1.1279 | valid ppl 3.0892 | learning rate 20.0000
| end of split 18 / 28 | epoch 1 | time: 3084.41s | valid loss 1.1277 | valid ppl 3.0885 | learning rate 20.0000
| end of split 19 / 28 | epoch 1 | time: 3083.52s | valid loss 1.1237 | valid ppl 3.0762 | learning rate 20.0000
| end of split 20 / 28 | epoch 1 | time: 3083.77s | valid loss 1.1200 | valid ppl 3.0649 | learning rate 20.0000
| end of split 21 / 28 | epoch 1 | time: 3080.82s | valid loss 1.1170 | valid ppl 3.0556 | learning rate 20.0000
| end of split 22 / 28 | epoch 1 | time: 3081.82s | valid loss 1.1157 | valid ppl 3.0516 | learning rate 20.0000
| end of split 23 / 28 | epoch 1 | time: 3083.61s | valid loss 1.1135 | valid ppl 3.0450 | learning rate 20.0000
| end of split 24 / 28 | epoch 1 | time: 3083.95s | valid loss 1.1100 | valid ppl 3.0343 | learning rate 20.0000
| end of split 25 / 28 | epoch 1 | time: 3079.21s | valid loss 1.1072 | valid ppl 3.0260 | learning rate 20.0000
| end of split 26 / 28 | epoch 1 | time: 3083.88s | valid loss 1.1086 | valid ppl 3.0303 | learning rate 20.0000
| end of split 27 / 28 | epoch 1 | time: 3203.88s | valid loss 1.1031 | valid ppl 3.0134 | learning rate 20.0000
| end of split 28 / 28 | epoch 1 | time: 965.58s | valid loss 1.1026 | valid ppl 3.0121 | learning rate 20.0000
| end of split 1 / 28 | epoch 2 | time: 3314.12s | valid loss 1.1022 | valid ppl 3.0108 | learning rate 20.0000
| end of split 2 / 28 | epoch 2 | time: 3475.79s | valid loss 1.0990 | valid ppl 3.0012 | learning rate 20.0000
| end of split 3 / 28 | epoch 2 | time: 3500.92s | valid loss 1.0974 | valid ppl 2.9965 | learning rate 20.0000
| end of split 4 / 28 | epoch 2 | time: 3501.92s | valid loss 1.0997 | valid ppl 3.0032 | learning rate 20.0000
| end of split 5 / 28 | epoch 2 | time: 3507.32s | valid loss 1.0945 | valid ppl 2.9878 | learning rate 20.0000
| end of split 6 / 28 | epoch 2 | time: 3502.18s | valid loss 1.0936 | valid ppl 2.9851 | learning rate 20.0000
| end of split 7 / 28 | epoch 2 | time: 999.87s | valid loss 1.0941 | valid ppl 2.9866 | learning rate 20.0000
| end of split 8 / 28 | epoch 2 | time: 3343.84s | valid loss 1.0923 | valid ppl 2.9810 | learning rate 20.0000
| end of split 9 / 28 | epoch 2 | time: 3340.49s | valid loss 1.0905 | valid ppl 2.9758 | learning rate 20.0000
| end of split 10 / 28 | epoch 2 | time: 3338.10s | valid loss 1.0919 | valid ppl 2.9798 | learning rate 20.0000
| end of split 11 / 28 | epoch 2 | time: 3331.60s | valid loss 1.0896 | valid ppl 2.9730 | learning rate 20.0000
| end of split 12 / 28 | epoch 2 | time: 3337.05s | valid loss 1.0863 | valid ppl 2.9632 | learning rate 20.0000
| end of split 13 / 28 | epoch 2 | time: 3336.59s | valid loss 1.0850 | valid ppl 2.9594 | learning rate 20.0000
| end of split 14 / 28 | epoch 2 | time: 3333.13s | valid loss 1.0850 | valid ppl 2.9593 | learning rate 20.0000
| end of split 15 / 28 | epoch 2 | time: 3331.93s | valid loss 1.0846 | valid ppl 2.9582 | learning rate 20.0000
| end of split 16 / 28 | epoch 2 | time: 3301.36s | valid loss 1.0835 | valid ppl 2.9549 | learning rate 20.0000
| end of split 17 / 28 | epoch 2 | time: 3308.70s | valid loss 1.0819 | valid ppl 2.9503 | learning rate 20.0000
| end of split 18 / 28 | epoch 2 | time: 3316.27s | valid loss 1.0817 | valid ppl 2.9497 | learning rate 20.0000
| end of split 19 / 28 | epoch 2 | time: 3310.75s | valid loss 1.0806 | valid ppl 2.9465 | learning rate 20.0000
| end of split 20 / 28 | epoch 2 | time: 3311.32s | valid loss 1.0781 | valid ppl 2.9391 | learning rate 20.0000
| end of split 21 / 28 | epoch 2 | time: 3309.05s | valid loss 1.0776 | valid ppl 2.9375 | learning rate 20.0000
| end of split 22 / 28 | epoch 2 | time: 3310.70s | valid loss 1.0780 | valid ppl 2.9389 | learning rate 20.0000
| end of split 23 / 28 | epoch 2 | time: 3311.48s | valid loss 1.0797 | valid ppl 2.9439 | learning rate 20.0000
| end of split 24 / 28 | epoch 2 | time: 3309.16s | valid loss 1.0760 | valid ppl 2.9330 | learning rate 20.0000
| end of split 25 / 28 | epoch 2 | time: 3300.41s | valid loss 1.0757 | valid ppl 2.9319 | learning rate 20.0000
| end of split 26 / 28 | epoch 2 | time: 3305.46s | valid loss 1.0736 | valid ppl 2.9260 | learning rate 20.0000
| end of split 27 / 28 | epoch 2 | time: 3307.40s | valid loss 1.0725 | valid ppl 2.9227 | learning rate 20.0000
| end of split 28 / 28 | epoch 2 | time: 3308.75s | valid loss 1.0735 | valid ppl 2.9256 | learning rate 20.0000
| end of split 1 / 28 | epoch 3 | time: 3335.00s | valid loss 1.0734 | valid ppl 2.9253 | learning rate 20.0000
| end of split 2 / 28 | epoch 3 | time: 3357.23s | valid loss 1.0715 | valid ppl 2.9198 | learning rate 20.0000
| end of split 3 / 28 | epoch 3 | time: 3354.52s | valid loss 1.0707 | valid ppl 2.9174 | learning rate 20.0000
| end of split 4 / 28 | epoch 3 | time: 3352.96s | valid loss 1.0696 | valid ppl 2.9143 | learning rate 20.0000
| end of split 5 / 28 | epoch 3 | time: 3350.73s | valid loss 1.0690 | valid ppl 2.9126 | learning rate 20.0000
| end of split 6 / 28 | epoch 3 | time: 3351.52s | valid loss 1.0686 | valid ppl 2.9113 | learning rate 20.0000
| end of split 7 / 28 | epoch 3 | time: 3334.50s | valid loss 1.0666 | valid ppl 2.9056 | learning rate 20.0000
| end of split 8 / 28 | epoch 3 | time: 3335.75s | valid loss 1.0687 | valid ppl 2.9115 | learning rate 20.0000
| end of split 9 / 28 | epoch 3 | time: 979.52s | valid loss 1.0667 | valid ppl 2.9058 | learning rate 20.0000
| end of split 10 / 28 | epoch 3 | time: 3340.27s | valid loss 1.0666 | valid ppl 2.9054 | learning rate 20.0000
| end of split 11 / 28 | epoch 3 | time: 3343.01s | valid loss 1.0676 | valid ppl 2.9084 | learning rate 20.0000
| end of split 12 / 28 | epoch 3 | time: 3344.63s | valid loss 1.0656 | valid ppl 2.9024 | learning rate 20.0000
| end of split 13 / 28 | epoch 3 | time: 3330.31s | valid loss 1.0663 | valid ppl 2.9047 | learning rate 20.0000
| end of split 14 / 28 | epoch 3 | time: 3340.17s | valid loss 1.0662 | valid ppl 2.9043 | learning rate 20.0000
| end of split 15 / 28 | epoch 3 | time: 3331.70s | valid loss 1.0651 | valid ppl 2.9010 | learning rate 20.0000
| end of split 16 / 28 | epoch 3 | time: 3345.00s | valid loss 1.0646 | valid ppl 2.8996 | learning rate 20.0000
| end of split 17 / 28 | epoch 3 | time: 3344.04s | valid loss 1.0627 | valid ppl 2.8943 | learning rate 20.0000
| end of split 18 / 28 | epoch 3 | time: 3342.21s | valid loss 1.0623 | valid ppl 2.8931 | learning rate 20.0000
| end of split 19 / 28 | epoch 3 | time: 3340.44s | valid loss 1.0627 | valid ppl 2.8941 | learning rate 20.0000
| end of split 20 / 28 | epoch 3 | time: 3308.47s | valid loss 1.0604 | valid ppl 2.8875 | learning rate 20.0000
| end of split 21 / 28 | epoch 3 | time: 3315.07s | valid loss 1.0617 | valid ppl 2.8912 | learning rate 20.0000
| end of split 22 / 28 | epoch 3 | time: 3323.04s | valid loss 1.0607 | valid ppl 2.8884 | learning rate 20.0000
| end of split 23 / 28 | epoch 3 | time: 3322.40s | valid loss 1.0600 | valid ppl 2.8863 | learning rate 20.0000
| end of split 24 / 28 | epoch 3 | time: 3328.09s | valid loss 1.0621 | valid ppl 2.8925 | learning rate 20.0000
| end of split 25 / 28 | epoch 3 | time: 3337.84s | valid loss 1.0617 | valid ppl 2.8912 | learning rate 20.0000
| end of split 26 / 28 | epoch 3 | time: 3328.62s | valid loss 1.0595 | valid ppl 2.8849 | learning rate 20.0000
| end of split 27 / 28 | epoch 3 | time: 3329.98s | valid loss 1.0603 | valid ppl 2.8871 | learning rate 20.0000
| end of split 28 / 28 | epoch 3 | time: 3326.62s | valid loss 1.0592 | valid ppl 2.8841 | learning rate 20.0000
| end of split 1 / 28 | epoch 4 | time: 3362.65s | valid loss 1.0588 | valid ppl 2.8829 | learning rate 20.0000
| end of split 2 / 28 | epoch 4 | time: 3372.84s | valid loss 1.0574 | valid ppl 2.8788 | learning rate 20.0000
| end of split 3 / 28 | epoch 4 | time: 3369.82s | valid loss 1.0593 | valid ppl 2.8843 | learning rate 20.0000
| end of split 4 / 28 | epoch 4 | time: 3369.24s | valid loss 1.0561 | valid ppl 2.8750 | learning rate 20.0000
| end of split 5 / 28 | epoch 4 | time: 3362.94s | valid loss 1.0567 | valid ppl 2.8768 | learning rate 20.0000
| end of split 6 / 28 | epoch 4 | time: 3364.27s | valid loss 1.0591 | valid ppl 2.8837 | learning rate 20.0000
| end of split 7 / 28 | epoch 4 | time: 3356.17s | valid loss 1.0548 | valid ppl 2.8714 | learning rate 20.0000
| end of split 8 / 28 | epoch 4 | time: 3345.16s | valid loss 1.0556 | valid ppl 2.8737 | learning rate 20.0000
| end of split 9 / 28 | epoch 4 | time: 3341.86s | valid loss 1.0568 | valid ppl 2.8771 | learning rate 20.0000
| end of split 10 / 28 | epoch 4 | time: 980.93s | valid loss 1.0546 | valid ppl 2.8708 | learning rate 20.0000
| end of split 11 / 28 | epoch 4 | time: 3346.04s | valid loss 1.0547 | valid ppl 2.8712 | learning rate 20.0000
| end of split 12 / 28 | epoch 4 | time: 3335.92s | valid loss 1.0545 | valid ppl 2.8705 | learning rate 20.0000
| end of split 13 / 28 | epoch 4 | time: 3336.81s | valid loss 1.0535 | valid ppl 2.8676 | learning rate 20.0000
| end of split 14 / 28 | epoch 4 | time: 3336.67s | valid loss 1.0539 | valid ppl 2.8689 | learning rate 20.0000
| end of split 15 / 28 | epoch 4 | time: 3337.57s | valid loss 1.0542 | valid ppl 2.8697 | learning rate 20.0000
| end of split 16 / 28 | epoch 4 | time: 3335.23s | valid loss 1.0544 | valid ppl 2.8702 | learning rate 20.0000
| end of split 17 / 28 | epoch 4 | time: 3337.46s | valid loss 1.0548 | valid ppl 2.8714 | learning rate 20.0000
| end of split 18 / 28 | epoch 4 | time: 3336.78s | valid loss 1.0522 | valid ppl 2.8641 | learning rate 20.0000
| end of split 19 / 28 | epoch 4 | time: 3335.97s | valid loss 1.0516 | valid ppl 2.8623 | learning rate 20.0000
| end of split 20 / 28 | epoch 4 | time: 3342.62s | valid loss 1.0522 | valid ppl 2.8639 | learning rate 20.0000
| end of split 21 / 28 | epoch 4 | time: 3346.48s | valid loss 1.0513 | valid ppl 2.8614 | learning rate 20.0000
| end of split 22 / 28 | epoch 4 | time: 3355.85s | valid loss 1.0510 | valid ppl 2.8605 | learning rate 20.0000
| end of split 23 / 28 | epoch 4 | time: 3359.76s | valid loss 1.0521 | valid ppl 2.8636 | learning rate 20.0000
| end of split 24 / 28 | epoch 4 | time: 3329.20s | valid loss 1.0524 | valid ppl 2.8644 | learning rate 20.0000
| end of split 25 / 28 | epoch 4 | time: 3355.82s | valid loss 1.0504 | valid ppl 2.8588 | learning rate 20.0000
| end of split 26 / 28 | epoch 4 | time: 3367.07s | valid loss 1.0508 | valid ppl 2.8600 | learning rate 20.0000
| end of split 27 / 28 | epoch 4 | time: 3366.55s | valid loss 1.0500 | valid ppl 2.8577 | learning rate 20.0000
| end of split 28 / 28 | epoch 4 | time: 3369.33s | valid loss 1.0501 | valid ppl 2.8580 | learning rate 20.0000
| end of split 1 / 28 | epoch 5 | time: 3342.95s | valid loss 1.0492 | valid ppl 2.8555 | learning rate 20.0000
| end of split 2 / 28 | epoch 5 | time: 3366.55s | valid loss 1.0498 | valid ppl 2.8571 | learning rate 20.0000
| end of split 3 / 28 | epoch 5 | time: 3356.80s | valid loss 1.0495 | valid ppl 2.8562 | learning rate 20.0000
| end of split 4 / 28 | epoch 5 | time: 3350.85s | valid loss 1.0484 | valid ppl 2.8531 | learning rate 20.0000
| end of split 5 / 28 | epoch 5 | time: 3351.73s | valid loss 1.0488 | valid ppl 2.8543 | learning rate 20.0000
| end of split 6 / 28 | epoch 5 | time: 3351.26s | valid loss 1.0479 | valid ppl 2.8516 | learning rate 20.0000
| end of split 7 / 28 | epoch 5 | time: 3351.24s | valid loss 1.0478 | valid ppl 2.8513 | learning rate 20.0000
| end of split 8 / 28 | epoch 5 | time: 3349.83s | valid loss 1.0484 | valid ppl 2.8531 | learning rate 20.0000
| end of split 9 / 28 | epoch 5 | time: 3348.30s | valid loss 1.0484 | valid ppl 2.8530 | learning rate 20.0000
| end of split 10 / 28 | epoch 5 | time: 3333.65s | valid loss 1.0473 | valid ppl 2.8500 | learning rate 20.0000
| end of split 11 / 28 | epoch 5 | time: 3345.83s | valid loss 1.0469 | valid ppl 2.8487 | learning rate 20.0000
| end of split 12 / 28 | epoch 5 | time: 3344.22s | valid loss 1.0480 | valid ppl 2.8518 | learning rate 20.0000
| end of split 13 / 28 | epoch 5 | time: 11361.46s | valid loss 1.0469 | valid ppl 2.8487 | learning rate 20.0000
| end of split 14 / 28 | epoch 5 | time: 3345.80s | valid loss 1.0478 | valid ppl 2.8514 | learning rate 20.0000
| end of split 15 / 28 | epoch 5 | time: 3347.61s | valid loss 1.0450 | valid ppl 2.8433 | learning rate 20.0000
| end of split 16 / 28 | epoch 5 | time: 3338.68s | valid loss 1.0458 | valid ppl 2.8456 | learning rate 20.0000
| end of split 17 / 28 | epoch 5 | time: 3356.79s | valid loss 1.0462 | valid ppl 2.8468 | learning rate 20.0000
| end of split 18 / 28 | epoch 5 | time: 3354.23s | valid loss 1.0468 | valid ppl 2.8486 | learning rate 20.0000
| end of split 19 / 28 | epoch 5 | time: 3361.30s | valid loss 1.0468 | valid ppl 2.8485 | learning rate 20.0000
| end of split 20 / 28 | epoch 5 | time: 3362.74s | valid loss 1.0451 | valid ppl 2.8436 | learning rate 20.0000
| end of split 21 / 28 | epoch 5 | time: 3369.02s | valid loss 1.0454 | valid ppl 2.8446 | learning rate 20.0000
| end of split 22 / 28 | epoch 5 | time: 988.45s | valid loss 1.0442 | valid ppl 2.8412 | learning rate 20.0000
| end of split 23 / 28 | epoch 5 | time: 3371.99s | valid loss 1.0436 | valid ppl 2.8394 | learning rate 20.0000
| end of split 24 / 28 | epoch 5 | time: 3372.04s | valid loss 1.0443 | valid ppl 2.8413 | learning rate 20.0000
| end of split 25 / 28 | epoch 5 | time: 3342.55s | valid loss 1.0439 | valid ppl 2.8402 | learning rate 20.0000
| end of split 26 / 28 | epoch 5 | time: 3360.09s | valid loss 1.0445 | valid ppl 2.8420 | learning rate 20.0000
| end of split 27 / 28 | epoch 5 | time: 3360.59s | valid loss 1.0434 | valid ppl 2.8390 | learning rate 20.0000
| end of split 28 / 28 | epoch 5 | time: 3355.31s | valid loss 1.0463 | valid ppl 2.8472 | learning rate 20.0000
| end of split 1 / 28 | epoch 6 | time: 3342.94s | valid loss 1.0447 | valid ppl 2.8424 | learning rate 20.0000
| end of split 2 / 28 | epoch 6 | time: 3349.59s | valid loss 1.0426 | valid ppl 2.8366 | learning rate 20.0000
| end of split 3 / 28 | epoch 6 | time: 3350.27s | valid loss 1.0440 | valid ppl 2.8405 | learning rate 20.0000
| end of split 4 / 28 | epoch 6 | time: 3352.12s | valid loss 1.0435 | valid ppl 2.8393 | learning rate 20.0000
| end of split 5 / 28 | epoch 6 | time: 3352.49s | valid loss 1.0418 | valid ppl 2.8342 | learning rate 20.0000
| end of split 6 / 28 | epoch 6 | time: 3353.69s | valid loss 1.0441 | valid ppl 2.8409 | learning rate 20.0000
| end of split 7 / 28 | epoch 6 | time: 3351.74s | valid loss 1.0437 | valid ppl 2.8396 | learning rate 20.0000
| end of split 8 / 28 | epoch 6 | time: 3354.03s | valid loss 1.0417 | valid ppl 2.8339 | learning rate 20.0000
| end of split 9 / 28 | epoch 6 | time: 3355.56s | valid loss 1.0409 | valid ppl 2.8319 | learning rate 20.0000
| end of split 10 / 28 | epoch 6 | time: 3353.42s | valid loss 1.0410 | valid ppl 2.8320 | learning rate 20.0000
| end of split 11 / 28 | epoch 6 | time: 3346.88s | valid loss 1.0406 | valid ppl 2.8308 | learning rate 20.0000
| end of split 12 / 28 | epoch 6 | time: 3351.99s | valid loss 1.0438 | valid ppl 2.8400 | learning rate 20.0000
| end of split 13 / 28 | epoch 6 | time: 3363.46s | valid loss 1.0416 | valid ppl 2.8338 | learning rate 20.0000
| end of split 14 / 28 | epoch 6 | time: 991.85s | valid loss 1.0420 | valid ppl 2.8350 | learning rate 20.0000
| end of split 15 / 28 | epoch 6 | time: 3390.33s | valid loss 1.0414 | valid ppl 2.8330 | learning rate 20.0000
| end of split 16 / 28 | epoch 6 | time: 3389.41s | valid loss 1.0402 | valid ppl 2.8297 | learning rate 20.0000
| end of split 17 / 28 | epoch 6 | time: 3389.89s | valid loss 1.0404 | valid ppl 2.8303 | learning rate 20.0000
| end of split 18 / 28 | epoch 6 | time: 3380.84s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 20.0000
| end of split 19 / 28 | epoch 6 | time: 3391.71s | valid loss 1.0410 | valid ppl 2.8321 | learning rate 20.0000
| end of split 20 / 28 | epoch 6 | time: 3380.11s | valid loss 1.0404 | valid ppl 2.8304 | learning rate 20.0000
| end of split 21 / 28 | epoch 6 | time: 3389.38s | valid loss 1.0398 | valid ppl 2.8287 | learning rate 20.0000
| end of split 22 / 28 | epoch 6 | time: 3384.16s | valid loss 1.0410 | valid ppl 2.8321 | learning rate 20.0000
| end of split 23 / 28 | epoch 6 | time: 3386.73s | valid loss 1.0423 | valid ppl 2.8357 | learning rate 20.0000
| end of split 24 / 28 | epoch 6 | time: 3384.70s | valid loss 1.0402 | valid ppl 2.8299 | learning rate 20.0000
| end of split 25 / 28 | epoch 6 | time: 3380.35s | valid loss 1.0393 | valid ppl 2.8272 | learning rate 20.0000
| end of split 26 / 28 | epoch 6 | time: 3379.51s | valid loss 1.0420 | valid ppl 2.8350 | learning rate 20.0000
| end of split 27 / 28 | epoch 6 | time: 3374.38s | valid loss 1.0412 | valid ppl 2.8326 | learning rate 20.0000
| end of split 28 / 28 | epoch 6 | time: 3368.86s | valid loss 1.0386 | valid ppl 2.8252 | learning rate 20.0000
| end of split 29 / 28 | epoch 6 | time: 3370.84s | valid loss 1.0389 | valid ppl 2.8262 | learning rate 20.0000
| end of split 30 / 28 | epoch 6 | time: 3387.68s | valid loss 1.0395 | valid ppl 2.8277 | learning rate 20.0000
| end of split 31 / 28 | epoch 6 | time: 3375.92s | valid loss 1.0390 | valid ppl 2.8265 | learning rate 20.0000
| end of split 32 / 28 | epoch 6 | time: 3383.55s | valid loss 1.0388 | valid ppl 2.8258 | learning rate 20.0000
| end of split 33 / 28 | epoch 6 | time: 3381.55s | valid loss 1.0378 | valid ppl 2.8230 | learning rate 20.0000
| end of split 34 / 28 | epoch 6 | time: 991.13s | valid loss 1.0382 | valid ppl 2.8241 | learning rate 20.0000
| end of split 35 / 28 | epoch 6 | time: 3384.35s | valid loss 1.0382 | valid ppl 2.8240 | learning rate 20.0000
| end of split 36 / 28 | epoch 6 | time: 3380.81s | valid loss 1.0377 | valid ppl 2.8228 | learning rate 20.0000
| end of split 37 / 28 | epoch 6 | time: 3383.28s | valid loss 1.0381 | valid ppl 2.8239 | learning rate 20.0000
| end of split 38 / 28 | epoch 6 | time: 3382.18s | valid loss 1.0380 | valid ppl 2.8236 | learning rate 20.0000
| end of split 39 / 28 | epoch 6 | time: 3389.48s | valid loss 1.0371 | valid ppl 2.8210 | learning rate 20.0000
| end of split 40 / 28 | epoch 6 | time: 3388.70s | valid loss 1.0386 | valid ppl 2.8252 | learning rate 20.0000
| end of split 41 / 28 | epoch 6 | time: 3390.47s | valid loss 1.0372 | valid ppl 2.8214 | learning rate 20.0000
| end of split 42 / 28 | epoch 6 | time: 3393.85s | valid loss 1.0376 | valid ppl 2.8225 | learning rate 20.0000
| end of split 43 / 28 | epoch 6 | time: 3406.04s | valid loss 1.0363 | valid ppl 2.8189 | learning rate 20.0000
| end of split 44 / 28 | epoch 6 | time: 3466.16s | valid loss 1.0365 | valid ppl 2.8194 | learning rate 20.0000
| end of split 45 / 28 | epoch 6 | time: 3444.11s | valid loss 1.0368 | valid ppl 2.8203 | learning rate 20.0000
| end of split 46 / 28 | epoch 6 | time: 3436.15s | valid loss 1.0368 | valid ppl 2.8202 | learning rate 20.0000
| end of split 47 / 28 | epoch 6 | time: 3434.69s | valid loss 1.0367 | valid ppl 2.8198 | learning rate 20.0000
| end of split 48 / 28 | epoch 6 | time: 3429.94s | valid loss 1.0401 | valid ppl 2.8295 | learning rate 20.0000
| end of split 49 / 28 | epoch 6 | time: 3426.04s | valid loss 1.0363 | valid ppl 2.8187 | learning rate 20.0000
| end of split 50 / 28 | epoch 6 | time: 3421.10s | valid loss 1.0364 | valid ppl 2.8190 | learning rate 20.0000
| end of split 51 / 28 | epoch 6 | time: 3412.38s | valid loss 1.0376 | valid ppl 2.8224 | learning rate 20.0000
| end of split 52 / 28 | epoch 6 | time: 3396.50s | valid loss 1.0363 | valid ppl 2.8187 | learning rate 20.0000
| end of split 53 / 28 | epoch 6 | time: 12135.07s | valid loss 1.0356 | valid ppl 2.8167 | learning rate 20.0000
| end of split 54 / 28 | epoch 6 | time: 3364.82s | valid loss 1.0363 | valid ppl 2.8187 | learning rate 20.0000
| end of split 55 / 28 | epoch 6 | time: 3390.69s | valid loss 1.0379 | valid ppl 2.8233 | learning rate 20.0000
| end of split 56 / 28 | epoch 6 | time: 3405.54s | valid loss 1.0355 | valid ppl 2.8164 | learning rate 20.0000
| end of split 29 / 28 | epoch 7 | time: 3355.56s | valid loss 1.0346 | valid ppl 2.8140 | learning rate 20.0000
| end of split 30 / 28 | epoch 7 | time: 3398.95s | valid loss 1.0354 | valid ppl 2.8164 | learning rate 20.0000
| end of split 31 / 28 | epoch 7 | time: 3403.92s | valid loss 1.0353 | valid ppl 2.8160 | learning rate 20.0000
| end of split 32 / 28 | epoch 7 | time: 3401.23s | valid loss 1.0362 | valid ppl 2.8186 | learning rate 20.0000
| end of split 33 / 28 | epoch 7 | time: 3399.23s | valid loss 1.0356 | valid ppl 2.8167 | learning rate 20.0000
| end of split 34 / 28 | epoch 7 | time: 3398.69s | valid loss 1.0361 | valid ppl 2.8182 | learning rate 20.0000
| end of split 35 / 28 | epoch 7 | time: 3404.16s | valid loss 1.0369 | valid ppl 2.8206 | learning rate 20.0000
| end of split 36 / 28 | epoch 7 | time: 3399.67s | valid loss 1.0342 | valid ppl 2.8128 | learning rate 20.0000
| end of split 37 / 28 | epoch 7 | time: 3398.06s | valid loss 1.0352 | valid ppl 2.8156 | learning rate 20.0000
| end of split 38 / 28 | epoch 7 | time: 3401.95s | valid loss 1.0352 | valid ppl 2.8157 | learning rate 20.0000
| end of split 39 / 28 | epoch 7 | time: 3422.71s | valid loss 1.0361 | valid ppl 2.8182 | learning rate 20.0000
| end of split 40 / 28 | epoch 7 | time: 1003.24s | valid loss 1.0355 | valid ppl 2.8166 | learning rate 20.0000
| end of split 41 / 28 | epoch 7 | time: 3424.69s | valid loss 1.0350 | valid ppl 2.8150 | learning rate 20.0000
| end of split 42 / 28 | epoch 7 | time: 3426.53s | valid loss 1.0353 | valid ppl 2.8159 | learning rate 20.0000
| end of split 43 / 28 | epoch 7 | time: 3426.58s | valid loss 1.0340 | valid ppl 2.8124 | learning rate 20.0000
| end of split 44 / 28 | epoch 7 | time: 3423.52s | valid loss 1.0354 | valid ppl 2.8161 | learning rate 20.0000
| end of split 45 / 28 | epoch 7 | time: 3416.13s | valid loss 1.0329 | valid ppl 2.8093 | learning rate 20.0000
| end of split 46 / 28 | epoch 7 | time: 3412.69s | valid loss 1.0351 | valid ppl 2.8155 | learning rate 20.0000
| end of split 47 / 28 | epoch 7 | time: 3407.49s | valid loss 1.0340 | valid ppl 2.8123 | learning rate 20.0000
| end of split 48 / 28 | epoch 7 | time: 3404.42s | valid loss 1.0327 | valid ppl 2.8086 | learning rate 20.0000
| end of split 49 / 28 | epoch 7 | time: 3400.72s | valid loss 1.0335 | valid ppl 2.8110 | learning rate 20.0000
| end of split 50 / 28 | epoch 7 | time: 3396.61s | valid loss 1.0341 | valid ppl 2.8126 | learning rate 20.0000
| end of split 51 / 28 | epoch 7 | time: 3393.63s | valid loss 1.0351 | valid ppl 2.8155 | learning rate 20.0000
| end of split 52 / 28 | epoch 7 | time: 3387.82s | valid loss 1.0321 | valid ppl 2.8070 | learning rate 20.0000
| end of split 53 / 28 | epoch 7 | time: 3373.80s | valid loss 1.0349 | valid ppl 2.8147 | learning rate 20.0000
| end of split 54 / 28 | epoch 7 | time: 3383.16s | valid loss 1.0321 | valid ppl 2.8069 | learning rate 20.0000
| end of split 55 / 28 | epoch 7 | time: 3385.56s | valid loss 1.0322 | valid ppl 2.8072 | learning rate 20.0000
| end of split 56 / 28 | epoch 7 | time: 3382.85s | valid loss 1.0320 | valid ppl 2.8066 | learning rate 20.0000
| end of split 29 / 28 | epoch 8 | time: 3371.57s | valid loss 1.0326 | valid ppl 2.8084 | learning rate 20.0000
| end of split 30 / 28 | epoch 8 | time: 3382.16s | valid loss 1.0344 | valid ppl 2.8133 | learning rate 20.0000
| end of split 31 / 28 | epoch 8 | time: 3373.30s | valid loss 1.0327 | valid ppl 2.8087 | learning rate 20.0000
| end of split 32 / 28 | epoch 8 | time: 3345.95s | valid loss 1.0332 | valid ppl 2.8102 | learning rate 20.0000
| end of split 33 / 28 | epoch 8 | time: 3368.96s | valid loss 1.0326 | valid ppl 2.8084 | learning rate 20.0000
| end of split 34 / 28 | epoch 8 | time: 3388.68s | valid loss 1.0318 | valid ppl 2.8062 | learning rate 20.0000
| end of split 35 / 28 | epoch 8 | time: 3373.57s | valid loss 1.0336 | valid ppl 2.8113 | learning rate 20.0000
| end of split 36 / 28 | epoch 8 | time: 3375.36s | valid loss 1.0342 | valid ppl 2.8127 | learning rate 20.0000
| end of split 37 / 28 | epoch 8 | time: 3374.28s | valid loss 1.0311 | valid ppl 2.8042 | learning rate 20.0000
| end of split 38 / 28 | epoch 8 | time: 3386.87s | valid loss 1.0321 | valid ppl 2.8070 | learning rate 20.0000
| end of split 39 / 28 | epoch 8 | time: 3385.72s | valid loss 1.0312 | valid ppl 2.8044 | learning rate 20.0000
| end of split 40 / 28 | epoch 8 | time: 991.55s | valid loss 1.0349 | valid ppl 2.8149 | learning rate 20.0000
| end of split 41 / 28 | epoch 8 | time: 3384.06s | valid loss 1.0315 | valid ppl 2.8052 | learning rate 20.0000
| end of split 42 / 28 | epoch 8 | time: 3380.84s | valid loss 1.0332 | valid ppl 2.8102 | learning rate 20.0000
| end of split 43 / 28 | epoch 8 | time: 3372.24s | valid loss 1.0324 | valid ppl 2.8079 | learning rate 20.0000
| end of split 44 / 28 | epoch 8 | time: 3367.32s | valid loss 1.0343 | valid ppl 2.8133 | learning rate 20.0000
| end of split 45 / 28 | epoch 8 | time: 3362.88s | valid loss 1.0305 | valid ppl 2.8026 | learning rate 20.0000
| end of split 46 / 28 | epoch 8 | time: 3352.06s | valid loss 1.0317 | valid ppl 2.8058 | learning rate 20.0000
| end of split 47 / 28 | epoch 8 | time: 5236.04s | valid loss 1.0310 | valid ppl 2.8038 | learning rate 20.0000
| end of split 48 / 28 | epoch 8 | time: 3337.66s | valid loss 1.0318 | valid ppl 2.8061 | learning rate 20.0000
| end of split 49 / 28 | epoch 8 | time: 3352.64s | valid loss 1.0319 | valid ppl 2.8064 | learning rate 20.0000
| end of split 50 / 28 | epoch 8 | time: 3353.74s | valid loss 1.0301 | valid ppl 2.8014 | learning rate 20.0000
| end of split 51 / 28 | epoch 8 | time: 3355.81s | valid loss 1.0329 | valid ppl 2.8092 | learning rate 20.0000
| end of split 52 / 28 | epoch 8 | time: 3345.28s | valid loss 1.0624 | valid ppl 2.8934 | learning rate 20.0000
| end of split 53 / 28 | epoch 8 | time: 3348.72s | valid loss 1.0307 | valid ppl 2.8031 | learning rate 20.0000
| end of split 54 / 28 | epoch 8 | time: 3349.58s | valid loss 1.0310 | valid ppl 2.8040 | learning rate 20.0000
| end of split 55 / 28 | epoch 8 | time: 3348.67s | valid loss 1.0302 | valid ppl 2.8017 | learning rate 20.0000
| end of split 56 / 28 | epoch 8 | time: 3346.76s | valid loss 1.0311 | valid ppl 2.8042 | learning rate 20.0000
| end of split 29 / 28 | epoch 9 | time: 3333.15s | valid loss 1.0323 | valid ppl 2.8076 | learning rate 20.0000
| end of split 30 / 28 | epoch 9 | time: 3355.20s | valid loss 1.0298 | valid ppl 2.8005 | learning rate 20.0000
| end of split 31 / 28 | epoch 9 | time: 3358.57s | valid loss 1.0301 | valid ppl 2.8012 | learning rate 20.0000
| end of split 32 / 28 | epoch 9 | time: 985.22s | valid loss 1.0299 | valid ppl 2.8007 | learning rate 20.0000
| end of split 33 / 28 | epoch 9 | time: 3364.70s | valid loss 1.0308 | valid ppl 2.8033 | learning rate 20.0000
| end of split 34 / 28 | epoch 9 | time: 3358.86s | valid loss 1.0299 | valid ppl 2.8008 | learning rate 20.0000
| end of split 35 / 28 | epoch 9 | time: 3373.80s | valid loss 1.0299 | valid ppl 2.8008 | learning rate 20.0000
| end of split 36 / 28 | epoch 9 | time: 3349.58s | valid loss 1.0294 | valid ppl 2.7993 | learning rate 20.0000
| end of split 37 / 28 | epoch 9 | time: 3363.90s | valid loss 1.0297 | valid ppl 2.8002 | learning rate 20.0000
| end of split 38 / 28 | epoch 9 | time: 3374.51s | valid loss 1.0307 | valid ppl 2.8031 | learning rate 20.0000
| end of split 39 / 28 | epoch 9 | time: 3368.04s | valid loss 1.0285 | valid ppl 2.7968 | learning rate 20.0000
| end of split 40 / 28 | epoch 9 | time: 3367.10s | valid loss 1.0289 | valid ppl 2.7979 | learning rate 20.0000
| end of split 41 / 28 | epoch 9 | time: 3362.55s | valid loss 1.0295 | valid ppl 2.7997 | learning rate 20.0000
| end of split 42 / 28 | epoch 9 | time: 3354.89s | valid loss 1.0287 | valid ppl 2.7975 | learning rate 20.0000
| end of split 43 / 28 | epoch 9 | time: 3351.48s | valid loss 1.0285 | valid ppl 2.7968 | learning rate 20.0000
| end of split 44 / 28 | epoch 9 | time: 3347.75s | valid loss 1.0299 | valid ppl 2.8009 | learning rate 20.0000
| end of split 45 / 28 | epoch 9 | time: 3353.75s | valid loss 1.0280 | valid ppl 2.7956 | learning rate 20.0000
| end of split 46 / 28 | epoch 9 | time: 3340.75s | valid loss 1.0294 | valid ppl 2.7994 | learning rate 20.0000
| end of split 47 / 28 | epoch 9 | time: 3350.77s | valid loss 1.0285 | valid ppl 2.7968 | learning rate 20.0000
| end of split 48 / 28 | epoch 9 | time: 3351.99s | valid loss 1.0278 | valid ppl 2.7948 | learning rate 20.0000
| end of split 49 / 28 | epoch 9 | time: 3341.78s | valid loss 1.0283 | valid ppl 2.7964 | learning rate 20.0000
| end of split 50 / 28 | epoch 9 | time: 3338.92s | valid loss 1.0302 | valid ppl 2.8016 | learning rate 20.0000
| end of split 51 / 28 | epoch 9 | time: 3338.22s | valid loss 1.0293 | valid ppl 2.7991 | learning rate 20.0000
| end of split 52 / 28 | epoch 9 | time: 3348.00s | valid loss 1.0286 | valid ppl 2.7970 | learning rate 20.0000
| end of split 53 / 28 | epoch 9 | time: 3340.37s | valid loss 1.0293 | valid ppl 2.7992 | learning rate 20.0000
| end of split 54 / 28 | epoch 9 | time: 3327.53s | valid loss 1.0279 | valid ppl 2.7951 | learning rate 20.0000
| end of split 55 / 28 | epoch 9 | time: 3335.99s | valid loss 1.0273 | valid ppl 2.7937 | learning rate 20.0000
| end of split 56 / 28 | epoch 9 | time: 13980.53s | valid loss 1.0284 | valid ppl 2.7965 | learning rate 20.0000
| end of split 29 / 28 | epoch 10 | time: 3355.65s | valid loss 1.0281 | valid ppl 2.7959 | learning rate 20.0000
| end of split 30 / 28 | epoch 10 | time: 3366.79s | valid loss 1.0287 | valid ppl 2.7973 | learning rate 20.0000
| end of split 31 / 28 | epoch 10 | time: 3368.82s | valid loss 1.0287 | valid ppl 2.7973 | learning rate 20.0000
| end of split 32 / 28 | epoch 10 | time: 990.06s | valid loss 1.0327 | valid ppl 2.8085 | learning rate 20.0000
| end of split 33 / 28 | epoch 10 | time: 3381.50s | valid loss 1.0277 | valid ppl 2.7948 | learning rate 20.0000
| end of split 34 / 28 | epoch 10 | time: 3384.53s | valid loss 1.0288 | valid ppl 2.7977 | learning rate 20.0000
| end of split 35 / 28 | epoch 10 | time: 3387.23s | valid loss 1.0335 | valid ppl 2.8108 | learning rate 20.0000
| end of split 36 / 28 | epoch 10 | time: 3367.46s | valid loss 1.0284 | valid ppl 2.7967 | learning rate 20.0000
| end of split 37 / 28 | epoch 10 | time: 3381.33s | valid loss 1.0273 | valid ppl 2.7936 | learning rate 20.0000
| end of split 38 / 28 | epoch 10 | time: 3373.72s | valid loss 1.0273 | valid ppl 2.7936 | learning rate 20.0000
| end of split 39 / 28 | epoch 10 | time: 3367.39s | valid loss 1.0228 | valid ppl 2.7810 | learning rate 5.0000
| end of split 40 / 28 | epoch 10 | time: 3365.36s | valid loss 1.0225 | valid ppl 2.7803 | learning rate 5.0000
| end of split 41 / 28 | epoch 10 | time: 3366.86s | valid loss 1.0223 | valid ppl 2.7796 | learning rate 5.0000
| end of split 42 / 28 | epoch 10 | time: 3368.15s | valid loss 1.0223 | valid ppl 2.7796 | learning rate 5.0000
| end of split 43 / 28 | epoch 10 | time: 3362.30s | valid loss 1.0220 | valid ppl 2.7789 | learning rate 5.0000
| end of split 44 / 28 | epoch 10 | time: 3364.78s | valid loss 1.0220 | valid ppl 2.7786 | learning rate 5.0000
| end of split 45 / 28 | epoch 10 | time: 3362.51s | valid loss 1.0219 | valid ppl 2.7784 | learning rate 5.0000
| end of split 46 / 28 | epoch 10 | time: 3366.20s | valid loss 1.0217 | valid ppl 2.7779 | learning rate 5.0000
| end of split 47 / 28 | epoch 10 | time: 3354.09s | valid loss 1.0217 | valid ppl 2.7779 | learning rate 5.0000
| end of split 48 / 28 | epoch 10 | time: 3361.91s | valid loss 1.0217 | valid ppl 2.7778 | learning rate 5.0000
| end of split 49 / 28 | epoch 10 | time: 3359.11s | valid loss 1.0215 | valid ppl 2.7775 | learning rate 5.0000
| end of split 50 / 28 | epoch 10 | time: 3354.83s | valid loss 1.0218 | valid ppl 2.7782 | learning rate 5.0000
| end of split 51 / 28 | epoch 10 | time: 3364.29s | valid loss 1.0215 | valid ppl 2.7773 | learning rate 5.0000
| end of split 52 / 28 | epoch 10 | time: 3387.06s | valid loss 1.0215 | valid ppl 2.7772 | learning rate 5.0000
| end of split 53 / 28 | epoch 10 | time: 3386.93s | valid loss 1.0214 | valid ppl 2.7771 | learning rate 5.0000
| end of split 54 / 28 | epoch 10 | time: 3388.66s | valid loss 1.0212 | valid ppl 2.7766 | learning rate 5.0000
| end of split 55 / 28 | epoch 10 | time: 3386.75s | valid loss 1.0212 | valid ppl 2.7764 | learning rate 5.0000
| end of split 56 / 28 | epoch 10 | time: 3386.25s | valid loss 1.0213 | valid ppl 2.7767 | learning rate 5.0000
| end of split 29 / 28 | epoch 11 | time: 3361.50s | valid loss 1.0212 | valid ppl 2.7765 | learning rate 5.0000
| end of split 30 / 28 | epoch 11 | time: 3388.02s | valid loss 1.0212 | valid ppl 2.7765 | learning rate 5.0000
| end of split 31 / 28 | epoch 11 | time: 3389.23s | valid loss 1.0211 | valid ppl 2.7761 | learning rate 5.0000
| end of split 32 / 28 | epoch 11 | time: 3376.47s | valid loss 1.0210 | valid ppl 2.7760 | learning rate 5.0000
| end of split 33 / 28 | epoch 11 | time: 3378.54s | valid loss 1.0211 | valid ppl 2.7763 | learning rate 5.0000
| end of split 34 / 28 | epoch 11 | time: 3371.86s | valid loss 1.0210 | valid ppl 2.7761 | learning rate 5.0000
| end of split 35 / 28 | epoch 11 | time: 988.54s | valid loss 1.0211 | valid ppl 2.7762 | learning rate 5.0000
| end of split 36 / 28 | epoch 11 | time: 3369.15s | valid loss 1.0210 | valid ppl 2.7761 | learning rate 5.0000
| end of split 37 / 28 | epoch 11 | time: 3362.72s | valid loss 1.0209 | valid ppl 2.7758 | learning rate 5.0000
| end of split 38 / 28 | epoch 11 | time: 3363.26s | valid loss 1.0210 | valid ppl 2.7759 | learning rate 5.0000
| end of split 39 / 28 | epoch 11 | time: 3359.86s | valid loss 1.0210 | valid ppl 2.7760 | learning rate 5.0000
| end of split 40 / 28 | epoch 11 | time: 3338.89s | valid loss 1.0209 | valid ppl 2.7758 | learning rate 5.0000
| end of split 41 / 28 | epoch 11 | time: 3356.02s | valid loss 1.0209 | valid ppl 2.7756 | learning rate 5.0000
| end of split 42 / 28 | epoch 11 | time: 3351.44s | valid loss 1.0208 | valid ppl 2.7753 | learning rate 5.0000
| end of split 43 / 28 | epoch 11 | time: 3350.87s | valid loss 1.0207 | valid ppl 2.7751 | learning rate 5.0000
| end of split 44 / 28 | epoch 11 | time: 3346.91s | valid loss 1.0207 | valid ppl 2.7752 | learning rate 5.0000
| end of split 45 / 28 | epoch 11 | time: 3348.82s | valid loss 1.0206 | valid ppl 2.7749 | learning rate 5.0000
| end of split 46 / 28 | epoch 11 | time: 3348.50s | valid loss 1.0207 | valid ppl 2.7750 | learning rate 5.0000
| end of split 47 / 28 | epoch 11 | time: 3346.52s | valid loss 1.0206 | valid ppl 2.7748 | learning rate 5.0000
| end of split 48 / 28 | epoch 11 | time: 3341.43s | valid loss 1.0206 | valid ppl 2.7748 | learning rate 5.0000
| end of split 49 / 28 | epoch 11 | time: 3342.42s | valid loss 1.0205 | valid ppl 2.7747 | learning rate 5.0000
| end of split 50 / 28 | epoch 11 | time: 3361.90s | valid loss 1.0205 | valid ppl 2.7747 | learning rate 5.0000
| end of split 51 / 28 | epoch 11 | time: 3373.79s | valid loss 1.0205 | valid ppl 2.7745 | learning rate 5.0000
| end of split 52 / 28 | epoch 11 | time: 3380.88s | valid loss 1.0205 | valid ppl 2.7746 | learning rate 5.0000
| end of split 53 / 28 | epoch 11 | time: 3380.44s | valid loss 1.0204 | valid ppl 2.7743 | learning rate 5.0000
| end of split 54 / 28 | epoch 11 | time: 3379.94s | valid loss 1.0204 | valid ppl 2.7743 | learning rate 5.0000
| end of split 55 / 28 | epoch 11 | time: 3379.47s | valid loss 1.0204 | valid ppl 2.7742 | learning rate 5.0000
| end of split 56 / 28 | epoch 11 | time: 3380.66s | valid loss 1.0204 | valid ppl 2.7742 | learning rate 5.0000
| end of split 29 / 28 | epoch 12 | time: 3378.16s | valid loss 1.0206 | valid ppl 2.7749 | learning rate 5.0000
| end of split 30 / 28 | epoch 12 | time: 3397.83s | valid loss 1.0205 | valid ppl 2.7746 | learning rate 5.0000
| end of split 31 / 28 | epoch 12 | time: 3392.19s | valid loss 1.0204 | valid ppl 2.7742 | learning rate 5.0000
| end of split 32 / 28 | epoch 12 | time: 3379.40s | valid loss 1.0204 | valid ppl 2.7743 | learning rate 5.0000
| end of split 33 / 28 | epoch 12 | time: 3373.61s | valid loss 1.0203 | valid ppl 2.7740 | learning rate 5.0000
| end of split 34 / 28 | epoch 12 | time: 3369.09s | valid loss 1.0202 | valid ppl 2.7738 | learning rate 5.0000
| end of split 35 / 28 | epoch 12 | time: 3370.15s | valid loss 1.0202 | valid ppl 2.7738 | learning rate 5.0000
| end of split 36 / 28 | epoch 12 | time: 3364.76s | valid loss 1.0202 | valid ppl 2.7736 | learning rate 5.0000
| end of split 37 / 28 | epoch 12 | time: 3362.81s | valid loss 1.0202 | valid ppl 2.7738 | learning rate 5.0000
| end of split 38 / 28 | epoch 12 | time: 3361.73s | valid loss 1.0201 | valid ppl 2.7736 | learning rate 5.0000
| end of split 39 / 28 | epoch 12 | time: 3362.24s | valid loss 1.0201 | valid ppl 2.7734 | learning rate 5.0000
| end of split 40 / 28 | epoch 12 | time: 3349.23s | valid loss 1.0201 | valid ppl 2.7735 | learning rate 5.0000
| end of split 41 / 28 | epoch 12 | time: 3349.66s | valid loss 1.0200 | valid ppl 2.7732 | learning rate 5.0000
| end of split 42 / 28 | epoch 12 | time: 3354.36s | valid loss 1.0200 | valid ppl 2.7733 | learning rate 5.0000
| end of split 43 / 28 | epoch 12 | time: 3337.30s | valid loss 1.0200 | valid ppl 2.7731 | learning rate 5.0000
| end of split 44 / 28 | epoch 12 | time: 3354.63s | valid loss 1.0200 | valid ppl 2.7733 | learning rate 5.0000
| end of split 45 / 28 | epoch 12 | time: 983.22s | valid loss 1.0200 | valid ppl 2.7732 | learning rate 5.0000
| end of split 46 / 28 | epoch 12 | time: 3353.47s | valid loss 1.0200 | valid ppl 2.7731 | learning rate 5.0000
| end of split 47 / 28 | epoch 12 | time: 3353.04s | valid loss 1.0199 | valid ppl 2.7730 | learning rate 5.0000
| end of split 48 / 28 | epoch 12 | time: 3362.69s | valid loss 1.0200 | valid ppl 2.7731 | learning rate 5.0000
| end of split 49 / 28 | epoch 12 | time: 3392.80s | valid loss 1.0198 | valid ppl 2.7726 | learning rate 5.0000
| end of split 50 / 28 | epoch 12 | time: 3394.63s | valid loss 1.0198 | valid ppl 2.7727 | learning rate 5.0000
| end of split 51 / 28 | epoch 12 | time: 3382.77s | valid loss 1.0199 | valid ppl 2.7728 | learning rate 5.0000
| end of split 52 / 28 | epoch 12 | time: 3385.26s | valid loss 1.0199 | valid ppl 2.7729 | learning rate 5.0000
| end of split 53 / 28 | epoch 12 | time: 3384.68s | valid loss 1.0198 | valid ppl 2.7725 | learning rate 5.0000
| end of split 54 / 28 | epoch 12 | time: 3381.93s | valid loss 1.0198 | valid ppl 2.7726 | learning rate 5.0000
| end of split 55 / 28 | epoch 12 | time: 3398.40s | valid loss 1.0197 | valid ppl 2.7723 | learning rate 5.0000
| end of split 56 / 28 | epoch 12 | time: 3396.09s | valid loss 1.0198 | valid ppl 2.7726 | learning rate 5.0000
| end of split 29 / 28 | epoch 13 | time: 3377.87s | valid loss 1.0197 | valid ppl 2.7723 | learning rate 5.0000
| end of split 30 / 28 | epoch 13 | time: 3374.68s | valid loss 1.0196 | valid ppl 2.7721 | learning rate 5.0000
| end of split 31 / 28 | epoch 13 | time: 3387.69s | valid loss 1.0196 | valid ppl 2.7722 | learning rate 5.0000
| end of split 32 / 28 | epoch 13 | time: 990.82s | valid loss 1.0197 | valid ppl 2.7723 | learning rate 5.0000
| end of split 33 / 28 | epoch 13 | time: 3369.69s | valid loss 1.0195 | valid ppl 2.7719 | learning rate 5.0000
| end of split 34 / 28 | epoch 13 | time: 3370.78s | valid loss 1.0197 | valid ppl 2.7723 | learning rate 5.0000
| end of split 35 / 28 | epoch 13 | time: 3370.56s | valid loss 1.0195 | valid ppl 2.7719 | learning rate 5.0000
| end of split 36 / 28 | epoch 13 | time: 3360.93s | valid loss 1.0195 | valid ppl 2.7719 | learning rate 5.0000
| end of split 37 / 28 | epoch 13 | time: 3361.03s | valid loss 1.0196 | valid ppl 2.7720 | learning rate 5.0000
| end of split 38 / 28 | epoch 13 | time: 3361.72s | valid loss 1.0196 | valid ppl 2.7722 | learning rate 5.0000
| end of split 39 / 28 | epoch 13 | time: 3350.87s | valid loss 1.0195 | valid ppl 2.7718 | learning rate 5.0000
| end of split 40 / 28 | epoch 13 | time: 3347.90s | valid loss 1.0195 | valid ppl 2.7718 | learning rate 5.0000
| end of split 41 / 28 | epoch 13 | time: 3345.82s | valid loss 1.0197 | valid ppl 2.7722 | learning rate 5.0000
| end of split 42 / 28 | epoch 13 | time: 3354.18s | valid loss 1.0194 | valid ppl 2.7716 | learning rate 5.0000
| end of split 43 / 28 | epoch 13 | time: 3350.06s | valid loss 1.0203 | valid ppl 2.7741 | learning rate 5.0000
| end of split 44 / 28 | epoch 13 | time: 3348.70s | valid loss 1.0194 | valid ppl 2.7716 | learning rate 5.0000
| end of split 45 / 28 | epoch 13 | time: 3351.28s | valid loss 1.0194 | valid ppl 2.7714 | learning rate 5.0000
| end of split 46 / 28 | epoch 13 | time: 3347.01s | valid loss 1.0194 | valid ppl 2.7714 | learning rate 5.0000
| end of split 47 / 28 | epoch 13 | time: 3338.57s | valid loss 1.0193 | valid ppl 2.7713 | learning rate 5.0000
| end of split 48 / 28 | epoch 13 | time: 3246.27s | valid loss 1.0192 | valid ppl 2.7711 | learning rate 5.0000
| end of split 49 / 28 | epoch 13 | time: 912.45s | valid loss 1.0192 | valid ppl 2.7711 | learning rate 5.0000
| end of split 50 / 28 | epoch 13 | time: 3234.34s | valid loss 1.0193 | valid ppl 2.7713 | learning rate 5.0000
| end of split 51 / 28 | epoch 13 | time: 3244.24s | valid loss 1.0192 | valid ppl 2.7710 | learning rate 5.0000
| end of split 52 / 28 | epoch 13 | time: 3244.31s | valid loss 1.0192 | valid ppl 2.7710 | learning rate 5.0000
| end of split 53 / 28 | epoch 13 | time: 3242.26s | valid loss 1.0198 | valid ppl 2.7727 | learning rate 5.0000
| end of split 54 / 28 | epoch 13 | time: 3242.91s | valid loss 1.0192 | valid ppl 2.7710 | learning rate 5.0000
| end of split 55 / 28 | epoch 13 | time: 3244.44s | valid loss 1.0191 | valid ppl 2.7707 | learning rate 5.0000
| end of split 56 / 28 | epoch 13 | time: 3242.99s | valid loss 1.0191 | valid ppl 2.7707 | learning rate 5.0000
| end of split 57 / 28 | epoch 13 | time: 3246.09s | valid loss 1.0191 | valid ppl 2.7707 | learning rate 5.0000
| end of split 58 / 28 | epoch 13 | time: 3234.45s | valid loss 1.0191 | valid ppl 2.7706 | learning rate 5.0000
| end of split 59 / 28 | epoch 13 | time: 3234.01s | valid loss 1.0192 | valid ppl 2.7708 | learning rate 5.0000
| end of split 60 / 28 | epoch 13 | time: 3232.69s | valid loss 1.0190 | valid ppl 2.7705 | learning rate 5.0000
| end of split 61 / 28 | epoch 13 | time: 3242.40s | valid loss 1.0193 | valid ppl 2.7712 | learning rate 5.0000
| end of split 62 / 28 | epoch 13 | time: 3241.93s | valid loss 1.0190 | valid ppl 2.7704 | learning rate 5.0000
| end of split 63 / 28 | epoch 13 | time: 3246.64s | valid loss 1.0191 | valid ppl 2.7706 | learning rate 5.0000
| end of split 64 / 28 | epoch 13 | time: 3245.70s | valid loss 1.0190 | valid ppl 2.7703 | learning rate 5.0000
| end of split 65 / 28 | epoch 13 | time: 3245.13s | valid loss 1.0189 | valid ppl 2.7701 | learning rate 5.0000
| end of split 66 / 28 | epoch 13 | time: 3243.63s | valid loss 1.0189 | valid ppl 2.7702 | learning rate 5.0000
| end of split 67 / 28 | epoch 13 | time: 3251.70s | valid loss 1.0191 | valid ppl 2.7707 | learning rate 5.0000
| end of split 68 / 28 | epoch 13 | time: 3249.44s | valid loss 1.0192 | valid ppl 2.7710 | learning rate 5.0000
| end of split 69 / 28 | epoch 13 | time: 3258.90s | valid loss 1.0189 | valid ppl 2.7701 | learning rate 5.0000
| end of split 70 / 28 | epoch 13 | time: 3259.22s | valid loss 1.0189 | valid ppl 2.7701 | learning rate 5.0000
| end of split 71 / 28 | epoch 13 | time: 3262.28s | valid loss 1.0189 | valid ppl 2.7701 | learning rate 5.0000
| end of split 72 / 28 | epoch 13 | time: 3261.10s | valid loss 1.0188 | valid ppl 2.7699 | learning rate 5.0000
| end of split 73 / 28 | epoch 13 | time: 3295.78s | valid loss 1.0188 | valid ppl 2.7699 | learning rate 5.0000
| end of split 74 / 28 | epoch 13 | time: 3298.17s | valid loss 1.0188 | valid ppl 2.7698 | learning rate 5.0000
| end of split 75 / 28 | epoch 13 | time: 3297.19s | valid loss 1.0189 | valid ppl 2.7701 | learning rate 5.0000
| end of split 76 / 28 | epoch 13 | time: 3294.64s | valid loss 1.0187 | valid ppl 2.7697 | learning rate 5.0000
| end of split 49 / 28 | epoch 14 | time: 3258.86s | valid loss 1.0188 | valid ppl 2.7697 | learning rate 5.0000
| end of split 50 / 28 | epoch 14 | time: 3284.30s | valid loss 1.0188 | valid ppl 2.7700 | learning rate 5.0000
| end of split 51 / 28 | epoch 14 | time: 3280.96s | valid loss 1.0187 | valid ppl 2.7696 | learning rate 5.0000
| end of split 52 / 28 | epoch 14 | time: 964.31s | valid loss 1.0187 | valid ppl 2.7695 | learning rate 5.0000
| end of split 53 / 28 | epoch 14 | time: 3306.49s | valid loss 1.0187 | valid ppl 2.7697 | learning rate 5.0000
| end of split 54 / 28 | epoch 14 | time: 3241.31s | valid loss 1.0186 | valid ppl 2.7694 | learning rate 5.0000
| end of split 55 / 28 | epoch 14 | time: 3289.83s | valid loss 1.0186 | valid ppl 2.7694 | learning rate 5.0000
| end of split 56 / 28 | epoch 14 | time: 3329.41s | valid loss 1.0187 | valid ppl 2.7696 | learning rate 5.0000
| end of split 57 / 28 | epoch 14 | time: 3305.55s | valid loss 1.0186 | valid ppl 2.7694 | learning rate 5.0000
| end of split 58 / 28 | epoch 14 | time: 3330.67s | valid loss 1.0187 | valid ppl 2.7696 | learning rate 5.0000
| end of split 59 / 28 | epoch 14 | time: 3292.81s | valid loss 1.0186 | valid ppl 2.7692 | learning rate 5.0000
| end of split 60 / 28 | epoch 14 | time: 3264.46s | valid loss 1.0185 | valid ppl 2.7691 | learning rate 5.0000
| end of split 61 / 28 | epoch 14 | time: 3315.30s | valid loss 1.0186 | valid ppl 2.7693 | learning rate 5.0000
| end of split 62 / 28 | epoch 14 | time: 3370.03s | valid loss 1.0186 | valid ppl 2.7693 | learning rate 5.0000
| end of split 63 / 28 | epoch 14 | time: 3376.57s | valid loss 1.0187 | valid ppl 2.7695 | learning rate 5.0000
| end of split 64 / 28 | epoch 14 | time: 3377.85s | valid loss 1.0185 | valid ppl 2.7690 | learning rate 5.0000
| end of split 65 / 28 | epoch 14 | time: 3375.37s | valid loss 1.0187 | valid ppl 2.7695 | learning rate 5.0000
| end of split 66 / 28 | epoch 14 | time: 3410.19s | valid loss 1.0184 | valid ppl 2.7689 | learning rate 5.0000
| end of split 67 / 28 | epoch 14 | time: 3408.62s | valid loss 1.0185 | valid ppl 2.7690 | learning rate 5.0000
| end of split 68 / 28 | epoch 14 | time: 997.06s | valid loss 1.0187 | valid ppl 2.7695 | learning rate 5.0000
| end of split 69 / 28 | epoch 14 | time: 3144.91s | valid loss 1.0185 | valid ppl 2.7692 | learning rate 5.0000
| end of split 70 / 28 | epoch 14 | time: 3324.27s | valid loss 1.0184 | valid ppl 2.7687 | learning rate 5.0000
| end of split 71 / 28 | epoch 14 | time: 3377.70s | valid loss 1.0183 | valid ppl 2.7686 | learning rate 5.0000
| end of split 72 / 28 | epoch 14 | time: 3379.24s | valid loss 1.0184 | valid ppl 2.7689 | learning rate 5.0000
| end of split 73 / 28 | epoch 14 | time: 3373.14s | valid loss 1.0184 | valid ppl 2.7687 | learning rate 5.0000
| end of split 74 / 28 | epoch 14 | time: 3342.04s | valid loss 1.0183 | valid ppl 2.7685 | learning rate 5.0000
| end of split 75 / 28 | epoch 14 | time: 3338.23s | valid loss 1.0182 | valid ppl 2.7683 | learning rate 5.0000
| end of split 76 / 28 | epoch 14 | time: 3336.11s | valid loss 1.0183 | valid ppl 2.7685 | learning rate 5.0000
| end of split 77 / 28 | epoch 14 | time: 3338.18s | valid loss 1.0183 | valid ppl 2.7685 | learning rate 5.0000
| end of split 78 / 28 | epoch 14 | time: 3336.34s | valid loss 1.0182 | valid ppl 2.7682 | learning rate 5.0000
| end of split 79 / 28 | epoch 14 | time: 3294.94s | valid loss 1.0182 | valid ppl 2.7682 | learning rate 5.0000
| end of split 80 / 28 | epoch 14 | time: 3131.17s | valid loss 1.0182 | valid ppl 2.7682 | learning rate 5.0000
| end of split 81 / 28 | epoch 14 | time: 3078.59s | valid loss 1.0182 | valid ppl 2.7683 | learning rate 5.0000
| end of split 82 / 28 | epoch 14 | time: 3077.37s | valid loss 1.0183 | valid ppl 2.7683 | learning rate 5.0000
| end of split 83 / 28 | epoch 14 | time: 3079.84s | valid loss 1.0182 | valid ppl 2.7681 | learning rate 5.0000
| end of split 84 / 28 | epoch 14 | time: 3077.24s | valid loss 1.0181 | valid ppl 2.7679 | learning rate 5.0000
| end of split 85 / 28 | epoch 14 | time: 3074.62s | valid loss 1.0181 | valid ppl 2.7680 | learning rate 5.0000
| end of split 86 / 28 | epoch 14 | time: 3074.32s | valid loss 1.0182 | valid ppl 2.7681 | learning rate 5.0000
| end of split 87 / 28 | epoch 14 | time: 3076.92s | valid loss 1.0181 | valid ppl 2.7679 | learning rate 5.0000
| end of split 60 / 28 | epoch 15 | time: 3077.51s | valid loss 1.0180 | valid ppl 2.7676 | learning rate 5.0000
| end of split 61 / 28 | epoch 15 | time: 3082.58s | valid loss 1.0180 | valid ppl 2.7678 | learning rate 5.0000
| end of split 62 / 28 | epoch 15 | time: 3260.22s | valid loss 1.0180 | valid ppl 2.7677 | learning rate 5.0000
| end of split 63 / 28 | epoch 15 | time: 3269.18s | valid loss 1.0180 | valid ppl 2.7676 | learning rate 5.0000
| end of split 64 / 28 | epoch 15 | time: 3272.98s | valid loss 1.0180 | valid ppl 2.7677 | learning rate 5.0000
| end of split 65 / 28 | epoch 15 | time: 3269.49s | valid loss 1.0180 | valid ppl 2.7677 | learning rate 5.0000
| end of split 66 / 28 | epoch 15 | time: 3275.29s | valid loss 1.0179 | valid ppl 2.7675 | learning rate 5.0000
| end of split 67 / 28 | epoch 15 | time: 3272.16s | valid loss 1.0181 | valid ppl 2.7679 | learning rate 5.0000
| end of split 68 / 28 | epoch 15 | time: 3271.56s | valid loss 1.0178 | valid ppl 2.7671 | learning rate 5.0000
| end of split 69 / 28 | epoch 15 | time: 3268.41s | valid loss 1.0178 | valid ppl 2.7672 | learning rate 5.0000
| end of split 70 / 28 | epoch 15 | time: 3262.53s | valid loss 1.0179 | valid ppl 2.7672 | learning rate 5.0000
| end of split 71 / 28 | epoch 15 | time: 3262.93s | valid loss 1.0178 | valid ppl 2.7672 | learning rate 5.0000
| end of split 72 / 28 | epoch 15 | time: 3257.22s | valid loss 1.0178 | valid ppl 2.7671 | learning rate 5.0000
| end of split 73 / 28 | epoch 15 | time: 3258.02s | valid loss 1.0178 | valid ppl 2.7672 | learning rate 5.0000
| end of split 74 / 28 | epoch 15 | time: 3249.01s | valid loss 1.0179 | valid ppl 2.7672 | learning rate 5.0000
| end of split 75 / 28 | epoch 15 | time: 3244.44s | valid loss 1.0178 | valid ppl 2.7672 | learning rate 5.0000
| end of split 76 / 28 | epoch 15 | time: 3244.47s | valid loss 1.0179 | valid ppl 2.7673 | learning rate 5.0000
| end of split 77 / 28 | epoch 15 | time: 28086.05s | valid loss 1.0178 | valid ppl 2.7672 | learning rate 5.0000
| end of split 78 / 28 | epoch 15 | time: 3236.09s | valid loss 1.0178 | valid ppl 2.7670 | learning rate 5.0000
| end of split 79 / 28 | epoch 15 | time: 3279.30s | valid loss 1.0177 | valid ppl 2.7668 | learning rate 5.0000
| end of split 80 / 28 | epoch 15 | time: 3279.10s | valid loss 1.0176 | valid ppl 2.7666 | learning rate 5.0000
| end of split 81 / 28 | epoch 15 | time: 3280.87s | valid loss 1.0177 | valid ppl 2.7669 | learning rate 5.0000
| end of split 82 / 28 | epoch 15 | time: 3244.65s | valid loss 1.0177 | valid ppl 2.7669 | learning rate 5.0000
| end of split 83 / 28 | epoch 15 | time: 3250.27s | valid loss 1.0180 | valid ppl 2.7676 | learning rate 5.0000
| end of split 84 / 28 | epoch 15 | time: 3249.29s | valid loss 1.0177 | valid ppl 2.7669 | learning rate 5.0000
| end of split 85 / 28 | epoch 15 | time: 3252.40s | valid loss 1.0177 | valid ppl 2.7667 | learning rate 5.0000
| end of split 86 / 28 | epoch 15 | time: 3247.67s | valid loss 1.0176 | valid ppl 2.7667 | learning rate 5.0000
| end of split 87 / 28 | epoch 15 | time: 3246.63s | valid loss 1.0176 | valid ppl 2.7667 | learning rate 5.0000
| end of split 60 / 28 | epoch 16 | time: 3260.69s | valid loss 1.0176 | valid ppl 2.7665 | learning rate 5.0000
| end of split 61 / 28 | epoch 16 | time: 3268.81s | valid loss 1.0176 | valid ppl 2.7665 | learning rate 5.0000
| end of split 62 / 28 | epoch 16 | time: 3262.69s | valid loss 1.0176 | valid ppl 2.7665 | learning rate 5.0000
| end of split 63 / 28 | epoch 16 | time: 3259.83s | valid loss 1.0176 | valid ppl 2.7664 | learning rate 5.0000
| end of split 64 / 28 | epoch 16 | time: 3248.84s | valid loss 1.0176 | valid ppl 2.7665 | learning rate 5.0000
| end of split 65 / 28 | epoch 16 | time: 3254.15s | valid loss 1.0174 | valid ppl 2.7661 | learning rate 5.0000
| end of split 66 / 28 | epoch 16 | time: 3250.93s | valid loss 1.0175 | valid ppl 2.7663 | learning rate 5.0000
| end of split 67 / 28 | epoch 16 | time: 3248.51s | valid loss 1.0174 | valid ppl 2.7661 | learning rate 5.0000
| end of split 68 / 28 | epoch 16 | time: 3249.54s | valid loss 1.0175 | valid ppl 2.7662 | learning rate 5.0000
| end of split 69 / 28 | epoch 16 | time: 3250.81s | valid loss 1.0176 | valid ppl 2.7666 | learning rate 5.0000
| end of split 70 / 28 | epoch 16 | time: 3247.34s | valid loss 1.0175 | valid ppl 2.7662 | learning rate 5.0000
| end of split 71 / 28 | epoch 16 | time: 3241.61s | valid loss 1.0174 | valid ppl 2.7661 | learning rate 5.0000
| end of split 72 / 28 | epoch 16 | time: 3241.40s | valid loss 1.0175 | valid ppl 2.7662 | learning rate 5.0000
| end of split 73 / 28 | epoch 16 | time: 3241.41s | valid loss 1.0174 | valid ppl 2.7661 | learning rate 5.0000
| end of split 74 / 28 | epoch 16 | time: 3238.90s | valid loss 1.0174 | valid ppl 2.7660 | learning rate 5.0000
| end of split 75 / 28 | epoch 16 | time: 950.51s | valid loss 1.0173 | valid ppl 2.7658 | learning rate 5.0000
| end of split 76 / 28 | epoch 16 | time: 3246.27s | valid loss 1.0175 | valid ppl 2.7663 | learning rate 5.0000
| end of split 77 / 28 | epoch 16 | time: 3262.21s | valid loss 1.0167 | valid ppl 2.7641 | learning rate 1.2500
| end of split 78 / 28 | epoch 16 | time: 3268.11s | valid loss 1.0166 | valid ppl 2.7639 | learning rate 1.2500
| end of split 79 / 28 | epoch 16 | time: 3269.38s | valid loss 1.0167 | valid ppl 2.7639 | learning rate 1.2500
| end of split 80 / 28 | epoch 16 | time: 3269.47s | valid loss 1.0166 | valid ppl 2.7638 | learning rate 1.2500
| end of split 81 / 28 | epoch 16 | time: 3258.11s | valid loss 1.0166 | valid ppl 2.7637 | learning rate 1.2500
| end of split 82 / 28 | epoch 16 | time: 3255.05s | valid loss 1.0165 | valid ppl 2.7636 | learning rate 1.2500
| end of split 83 / 28 | epoch 16 | time: 3267.27s | valid loss 1.0165 | valid ppl 2.7635 | learning rate 1.2500
| end of split 84 / 28 | epoch 16 | time: 3267.39s | valid loss 1.0165 | valid ppl 2.7634 | learning rate 1.2500
| end of split 85 / 28 | epoch 16 | time: 3268.88s | valid loss 1.0165 | valid ppl 2.7635 | learning rate 1.2500
| end of split 86 / 28 | epoch 16 | time: 3271.95s | valid loss 1.0165 | valid ppl 2.7634 | learning rate 1.2500
| end of split 87 / 28 | epoch 16 | time: 3211.88s | valid loss 1.0165 | valid ppl 2.7635 | learning rate 1.2500
| end of split 60 / 28 | epoch 17 | time: 3260.00s | valid loss 1.0164 | valid ppl 2.7632 | learning rate 1.2500
| end of split 61 / 28 | epoch 17 | time: 3331.83s | valid loss 1.0164 | valid ppl 2.7632 | learning rate 1.2500
| end of split 62 / 28 | epoch 17 | time: 3266.91s | valid loss 1.0164 | valid ppl 2.7632 | learning rate 1.2500
| end of split 63 / 28 | epoch 17 | time: 3267.75s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 1.2500
| end of split 64 / 28 | epoch 17 | time: 3265.47s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 1.2500
| end of split 65 / 28 | epoch 17 | time: 3253.35s | valid loss 1.0164 | valid ppl 2.7632 | learning rate 1.2500
| end of split 66 / 28 | epoch 17 | time: 3263.04s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 1.2500
| end of split 67 / 28 | epoch 17 | time: 3265.59s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 1.2500
| end of split 68 / 28 | epoch 17 | time: 3264.33s | valid loss 1.0164 | valid ppl 2.7631 | learning rate 1.2500
| end of split 69 / 28 | epoch 17 | time: 3256.57s | valid loss 1.0164 | valid ppl 2.7632 | learning rate 1.2500
| end of split 70 / 28 | epoch 17 | time: 3257.38s | valid loss 1.0163 | valid ppl 2.7629 | learning rate 0.3125
| end of split 71 / 28 | epoch 17 | time: 3250.94s | valid loss 1.0163 | valid ppl 2.7629 | learning rate 0.3125
| end of split 72 / 28 | epoch 17 | time: 3250.33s | valid loss 1.0163 | valid ppl 2.7628 | learning rate 0.3125
| end of split 73 / 28 | epoch 17 | time: 3258.39s | valid loss 1.0162 | valid ppl 2.7628 | learning rate 0.3125
| end of split 74 / 28 | epoch 17 | time: 3256.39s | valid loss 1.0162 | valid ppl 2.7627 | learning rate 0.3125
| end of split 75 / 28 | epoch 17 | time: 956.20s | valid loss 1.0162 | valid ppl 2.7627 | learning rate 0.3125
| end of split 76 / 28 | epoch 17 | time: 3275.60s | valid loss 1.0162 | valid ppl 2.7627 | learning rate 0.3125
| end of split 77 / 28 | epoch 17 | time: 3281.88s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.3125
| end of split 78 / 28 | epoch 17 | time: 3282.88s | valid loss 1.0162 | valid ppl 2.7627 | learning rate 0.3125
| end of split 79 / 28 | epoch 17 | time: 3281.60s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.3125
| end of split 80 / 28 | epoch 17 | time: 3282.62s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.3125
| end of split 81 / 28 | epoch 17 | time: 3287.94s | valid loss 1.0162 | valid ppl 2.7627 | learning rate 0.3125
| end of split 82 / 28 | epoch 17 | time: 3278.46s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0781
| end of split 83 / 28 | epoch 17 | time: 3290.21s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0781
| end of split 84 / 28 | epoch 17 | time: 3296.94s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0781
| end of split 85 / 28 | epoch 17 | time: 3201.69s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 86 / 28 | epoch 17 | time: 26632.43s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 87 / 28 | epoch 17 | time: 3289.17s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 60 / 28 | epoch 18 | time: 3276.56s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 61 / 28 | epoch 18 | time: 3295.10s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 62 / 28 | epoch 18 | time: 3294.42s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 63 / 28 | epoch 18 | time: 3260.23s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 64 / 28 | epoch 18 | time: 3260.96s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 65 / 28 | epoch 18 | time: 3265.06s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0781
| end of split 66 / 28 | epoch 18 | time: 3269.36s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 67 / 28 | epoch 18 | time: 3269.99s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 68 / 28 | epoch 18 | time: 3267.74s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 69 / 28 | epoch 18 | time: 3276.93s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 70 / 28 | epoch 18 | time: 3280.38s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 71 / 28 | epoch 18 | time: 3281.48s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 72 / 28 | epoch 18 | time: 3269.65s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 73 / 28 | epoch 18 | time: 3276.15s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 74 / 28 | epoch 18 | time: 3286.28s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 75 / 28 | epoch 18 | time: 3286.31s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 76 / 28 | epoch 18 | time: 3289.19s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0195
| end of split 77 / 28 | epoch 18 | time: 3280.75s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 78 / 28 | epoch 18 | time: 3272.43s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 79 / 28 | epoch 18 | time: 3270.42s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 80 / 28 | epoch 18 | time: 3260.89s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 81 / 28 | epoch 18 | time: 958.00s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 82 / 28 | epoch 18 | time: 3270.36s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 83 / 28 | epoch 18 | time: 3265.08s | valid loss 1.0161 | valid ppl 2.7625 | learning rate 0.0049
| end of split 84 / 28 | epoch 18 | time: 3109.41s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 85 / 28 | epoch 18 | time: 3242.24s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 86 / 28 | epoch 18 | time: 3245.23s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 87 / 28 | epoch 18 | time: 3244.78s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 88 / 28 | epoch 18 | time: 3241.61s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 89 / 28 | epoch 18 | time: 3196.37s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 90 / 28 | epoch 18 | time: 3215.04s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 91 / 28 | epoch 18 | time: 3226.07s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 92 / 28 | epoch 18 | time: 3222.68s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 93 / 28 | epoch 18 | time: 3224.21s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 94 / 28 | epoch 18 | time: 3225.28s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 95 / 28 | epoch 18 | time: 3229.80s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 96 / 28 | epoch 18 | time: 3226.91s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 97 / 28 | epoch 18 | time: 3230.45s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 98 / 28 | epoch 18 | time: 3238.09s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 99 / 28 | epoch 18 | time: 3154.25s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 100 / 28 | epoch 18 | time: 3070.87s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0049
| end of split 101 / 28 | epoch 18 | time: 3198.30s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 102 / 28 | epoch 18 | time: 3225.92s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 103 / 28 | epoch 18 | time: 3226.40s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 104 / 28 | epoch 18 | time: 3224.60s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 105 / 28 | epoch 18 | time: 3226.68s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 106 / 28 | epoch 18 | time: 3205.24s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 107 / 28 | epoch 18 | time: 3071.76s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 108 / 28 | epoch 18 | time: 3071.41s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 109 / 28 | epoch 18 | time: 3070.93s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 110 / 28 | epoch 18 | time: 3186.79s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 111 / 28 | epoch 18 | time: 3220.26s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0012
| end of split 112 / 28 | epoch 18 | time: 3219.30s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 113 / 28 | epoch 18 | time: 3221.23s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 114 / 28 | epoch 18 | time: 3222.67s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 115 / 28 | epoch 18 | time: 943.78s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 116 / 28 | epoch 18 | time: 3223.86s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 89 / 28 | epoch 19 | time: 3233.85s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 90 / 28 | epoch 19 | time: 3244.63s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 91 / 28 | epoch 19 | time: 3210.64s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 92 / 28 | epoch 19 | time: 3255.17s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 93 / 28 | epoch 19 | time: 3253.87s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 94 / 28 | epoch 19 | time: 3228.56s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0003
| end of split 95 / 28 | epoch 19 | time: 3228.92s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 96 / 28 | epoch 19 | time: 3228.20s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 97 / 28 | epoch 19 | time: 3225.83s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 98 / 28 | epoch 19 | time: 3230.12s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 99 / 28 | epoch 19 | time: 3230.77s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 100 / 28 | epoch 19 | time: 3230.30s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 101 / 28 | epoch 19 | time: 3229.89s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 102 / 28 | epoch 19 | time: 3231.94s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 103 / 28 | epoch 19 | time: 3232.25s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 104 / 28 | epoch 19 | time: 946.84s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 105 / 28 | epoch 19 | time: 3236.58s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0001
| end of split 106 / 28 | epoch 19 | time: 3231.49s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 107 / 28 | epoch 19 | time: 3061.65s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 108 / 28 | epoch 19 | time: 3068.41s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 109 / 28 | epoch 19 | time: 3171.54s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 110 / 28 | epoch 19 | time: 3219.31s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 111 / 28 | epoch 19 | time: 3223.33s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 112 / 28 | epoch 19 | time: 3234.39s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 113 / 28 | epoch 19 | time: 3216.95s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 114 / 28 | epoch 19 | time: 3234.14s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 115 / 28 | epoch 19 | time: 3224.71s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 116 / 28 | epoch 19 | time: 3226.63s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 117 / 28 | epoch 19 | time: 3229.48s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 118 / 28 | epoch 19 | time: 3232.38s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 119 / 28 | epoch 19 | time: 3237.27s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 120 / 28 | epoch 19 | time: 3237.08s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 121 / 28 | epoch 19 | time: 3240.62s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 122 / 28 | epoch 19 | time: 3235.57s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 123 / 28 | epoch 19 | time: 3238.57s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 124 / 28 | epoch 19 | time: 3256.40s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 125 / 28 | epoch 19 | time: 3248.44s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 126 / 28 | epoch 19 | time: 3256.45s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 127 / 28 | epoch 19 | time: 3256.73s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 128 / 28 | epoch 19 | time: 955.13s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 129 / 28 | epoch 19 | time: 3257.28s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 130 / 28 | epoch 19 | time: 3265.30s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 131 / 28 | epoch 19 | time: 3263.36s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 132 / 28 | epoch 19 | time: 3263.01s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 133 / 28 | epoch 19 | time: 3259.27s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 134 / 28 | epoch 19 | time: 3259.60s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 107 / 28 | epoch 20 | time: 3283.80s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 108 / 28 | epoch 20 | time: 3297.21s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 109 / 28 | epoch 20 | time: 966.27s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 110 / 28 | epoch 20 | time: 3301.56s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 111 / 28 | epoch 20 | time: 3290.84s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 112 / 28 | epoch 20 | time: 3307.23s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 113 / 28 | epoch 20 | time: 3308.71s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 114 / 28 | epoch 20 | time: 3255.85s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 115 / 28 | epoch 20 | time: 3280.88s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 116 / 28 | epoch 20 | time: 3278.73s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 117 / 28 | epoch 20 | time: 3281.74s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 118 / 28 | epoch 20 | time: 3287.28s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 119 / 28 | epoch 20 | time: 3292.36s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 120 / 28 | epoch 20 | time: 3299.88s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 121 / 28 | epoch 20 | time: 3301.06s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 122 / 28 | epoch 20 | time: 3304.04s | valid loss 1.0161 | valid ppl 2.7624 | learning rate 0.0000
| end of split 123 / 28 | epoch 20 | time: 3099.94s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 124 / 28 | epoch 20 | time: 3300.51s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 125 / 28 | epoch 20 | time: 3294.64s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 126 / 28 | epoch 20 | time: 3294.28s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 127 / 28 | epoch 20 | time: 3308.93s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 128 / 28 | epoch 20 | time: 3311.04s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 129 / 28 | epoch 20 | time: 3312.94s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 130 / 28 | epoch 20 | time: 3310.97s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 131 / 28 | epoch 20 | time: 3315.08s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 132 / 28 | epoch 20 | time: 3299.36s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 133 / 28 | epoch 20 | time: 3303.61s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 134 / 28 | epoch 20 | time: 3310.19s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 135 / 28 | epoch 20 | time: 969.87s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 136 / 28 | epoch 20 | time: 3308.41s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 137 / 28 | epoch 20 | time: 3302.14s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 138 / 28 | epoch 20 | time: 3316.65s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 139 / 28 | epoch 20 | time: 3315.87s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 140 / 28 | epoch 20 | time: 3309.19s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 141 / 28 | epoch 20 | time: 3306.84s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 142 / 28 | epoch 20 | time: 3322.57s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 143 / 28 | epoch 20 | time: 3326.13s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 144 / 28 | epoch 20 | time: 3326.71s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 145 / 28 | epoch 20 | time: 3332.68s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 146 / 28 | epoch 20 | time: 3331.75s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 147 / 28 | epoch 20 | time: 3333.64s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 148 / 28 | epoch 20 | time: 3329.99s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 149 / 28 | epoch 20 | time: 3329.76s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
| end of split 150 / 28 | epoch 20 | time: 3328.06s | valid loss 1.0162 | valid ppl 2.7626 | learning rate 0.0000
TEST: valid loss 1.0162 | valid ppl 2.7626