File size: 26,486 Bytes
962439a 2164244 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
| end of split 1 /113 | epoch 1 | time: 224.45s | valid loss 7.6183 | valid ppl 2035.0861 | learning rate 5.0000 | end of split 2 /113 | epoch 1 | time: 229.45s | valid loss 7.3864 | valid ppl 1613.9065 | learning rate 5.0000 | end of split 3 /113 | epoch 1 | time: 239.40s | valid loss 7.3424 | valid ppl 1544.3504 | learning rate 5.0000 | end of split 4 /113 | epoch 1 | time: 233.67s | valid loss 7.2568 | valid ppl 1417.6838 | learning rate 5.0000 | end of split 5 /113 | epoch 1 | time: 227.57s | valid loss 7.2848 | valid ppl 1458.0133 | learning rate 5.0000 | end of split 6 /113 | epoch 1 | time: 235.49s | valid loss 7.2458 | valid ppl 1402.2080 | learning rate 5.0000 | end of split 7 /113 | epoch 1 | time: 235.14s | valid loss 7.2137 | valid ppl 1357.8841 | learning rate 5.0000 | end of split 8 /113 | epoch 1 | time: 238.90s | valid loss 7.1989 | valid ppl 1337.9002 | learning rate 5.0000 | end of split 9 /113 | epoch 1 | time: 228.81s | valid loss 7.1782 | valid ppl 1310.5202 | learning rate 5.0000 | end of split 10 /113 | epoch 1 | time: 230.95s | valid loss 7.1692 | valid ppl 1298.8697 | learning rate 5.0000 | end of split 11 /113 | epoch 1 | time: 231.70s | valid loss 7.1442 | valid ppl 1266.7305 | learning rate 5.0000 | end of split 12 /113 | epoch 1 | time: 240.42s | valid loss 7.1839 | valid ppl 1317.9954 | learning rate 5.0000 | end of split 13 /113 | epoch 1 | time: 235.25s | valid loss 7.2127 | valid ppl 1356.5282 | learning rate 5.0000 | end of split 14 /113 | epoch 1 | time: 232.67s | valid loss 7.2704 | valid ppl 1437.1488 | learning rate 5.0000 | end of split 15 /113 | epoch 1 | time: 229.99s | valid loss 7.1410 | valid ppl 1262.7434 | learning rate 5.0000 | end of split 16 /113 | epoch 1 | time: 230.24s | valid loss 7.2028 | valid ppl 1343.1933 | learning rate 5.0000 | end of split 17 /113 | epoch 1 | time: 48.80s | valid loss 7.1864 | valid ppl 1321.2975 | learning rate 5.0000 | end of split 18 /113 | epoch 1 | time: 238.71s | valid loss 7.1344 | valid ppl 1254.4124 | learning rate 5.0000 | end of split 19 /113 | epoch 1 | time: 238.74s | valid loss 7.1402 | valid ppl 1261.6803 | learning rate 5.0000 | end of split 20 /113 | epoch 1 | time: 230.88s | valid loss 7.2222 | valid ppl 1369.5573 | learning rate 5.0000 | end of split 21 /113 | epoch 1 | time: 235.01s | valid loss 7.1024 | valid ppl 1214.8458 | learning rate 5.0000 | end of split 22 /113 | epoch 1 | time: 233.22s | valid loss 7.1523 | valid ppl 1277.0068 | learning rate 5.0000 | end of split 23 /113 | epoch 1 | time: 234.10s | valid loss 7.1516 | valid ppl 1276.1012 | learning rate 5.0000 | end of split 24 /113 | epoch 1 | time: 234.94s | valid loss 7.1347 | valid ppl 1254.7220 | learning rate 5.0000 | end of split 25 /113 | epoch 1 | time: 232.93s | valid loss 7.1199 | valid ppl 1236.2833 | learning rate 5.0000 | end of split 26 /113 | epoch 1 | time: 234.40s | valid loss 7.1184 | valid ppl 1234.5018 | learning rate 5.0000 | end of split 27 /113 | epoch 1 | time: 237.28s | valid loss 7.1083 | valid ppl 1222.0958 | learning rate 5.0000 | end of split 28 /113 | epoch 1 | time: 231.57s | valid loss 7.1589 | valid ppl 1285.4715 | learning rate 5.0000 | end of split 29 /113 | epoch 1 | time: 232.64s | valid loss 7.1232 | valid ppl 1240.4354 | learning rate 5.0000 | end of split 30 /113 | epoch 1 | time: 238.52s | valid loss 7.0960 | valid ppl 1207.1889 | learning rate 5.0000 | end of split 31 /113 | epoch 1 | time: 235.86s | valid loss 7.1294 | valid ppl 1248.0873 | learning rate 5.0000 | end of split 32 /113 | epoch 1 | time: 234.67s | valid loss 7.1366 | valid ppl 1257.1105 | learning rate 5.0000 | end of split 33 /113 | epoch 1 | time: 236.46s | valid loss 7.0806 | valid ppl 1188.6487 | learning rate 5.0000 | end of split 34 /113 | epoch 1 | time: 231.14s | valid loss 7.1160 | valid ppl 1231.4851 | learning rate 5.0000 | end of split 35 /113 | epoch 1 | time: 236.11s | valid loss 7.1426 | valid ppl 1264.6883 | learning rate 5.0000 | end of split 36 /113 | epoch 1 | time: 232.98s | valid loss 7.1442 | valid ppl 1266.7118 | learning rate 5.0000 | end of split 37 /113 | epoch 1 | time: 235.77s | valid loss 7.1382 | valid ppl 1259.1016 | learning rate 5.0000 | end of split 38 /113 | epoch 1 | time: 235.38s | valid loss 7.0742 | valid ppl 1181.0755 | learning rate 5.0000 | end of split 39 /113 | epoch 1 | time: 230.26s | valid loss 7.1081 | valid ppl 1221.7934 | learning rate 5.0000 | end of split 40 /113 | epoch 1 | time: 233.25s | valid loss 7.0893 | valid ppl 1199.0533 | learning rate 5.0000 | end of split 41 /113 | epoch 1 | time: 232.96s | valid loss 7.0886 | valid ppl 1198.2460 | learning rate 5.0000 | end of split 42 /113 | epoch 1 | time: 233.86s | valid loss 7.1457 | valid ppl 1268.6031 | learning rate 5.0000 | end of split 43 /113 | epoch 1 | time: 234.62s | valid loss 7.1386 | valid ppl 1259.6532 | learning rate 5.0000 | end of split 44 /113 | epoch 1 | time: 232.69s | valid loss 7.0900 | valid ppl 1199.9118 | learning rate 5.0000 | end of split 45 /113 | epoch 1 | time: 230.84s | valid loss 7.1523 | valid ppl 1276.9780 | learning rate 5.0000 | end of split 46 /113 | epoch 1 | time: 231.71s | valid loss 7.1219 | valid ppl 1238.7760 | learning rate 5.0000 | end of split 47 /113 | epoch 1 | time: 230.86s | valid loss 7.0811 | valid ppl 1189.2806 | learning rate 5.0000 | end of split 48 /113 | epoch 1 | time: 232.63s | valid loss 7.1543 | valid ppl 1279.6527 | learning rate 5.0000 | end of split 49 /113 | epoch 1 | time: 233.86s | valid loss 7.0683 | valid ppl 1174.0986 | learning rate 5.0000 | end of split 50 /113 | epoch 1 | time: 229.15s | valid loss 7.0550 | valid ppl 1158.6403 | learning rate 5.0000 | end of split 51 /113 | epoch 1 | time: 236.63s | valid loss 7.1117 | valid ppl 1226.2546 | learning rate 5.0000 | end of split 52 /113 | epoch 1 | time: 238.10s | valid loss 7.1026 | valid ppl 1215.1584 | learning rate 5.0000 | end of split 53 /113 | epoch 1 | time: 232.74s | valid loss 7.0969 | valid ppl 1208.2648 | learning rate 5.0000 | end of split 54 /113 | epoch 1 | time: 238.09s | valid loss 7.0846 | valid ppl 1193.4612 | learning rate 5.0000 | end of split 55 /113 | epoch 1 | time: 233.70s | valid loss 7.1157 | valid ppl 1231.1284 | learning rate 5.0000 | end of split 56 /113 | epoch 1 | time: 230.09s | valid loss 7.0540 | valid ppl 1157.4801 | learning rate 5.0000 | end of split 57 /113 | epoch 1 | time: 235.27s | valid loss 7.0783 | valid ppl 1185.9658 | learning rate 5.0000 | end of split 58 /113 | epoch 1 | time: 233.74s | valid loss 7.1189 | valid ppl 1235.0774 | learning rate 5.0000 | end of split 59 /113 | epoch 1 | time: 229.77s | valid loss 7.0364 | valid ppl 1137.2668 | learning rate 5.0000 | end of split 60 /113 | epoch 1 | time: 233.24s | valid loss 7.0514 | valid ppl 1154.5030 | learning rate 5.0000 | end of split 61 /113 | epoch 1 | time: 236.63s | valid loss 7.1055 | valid ppl 1218.6020 | learning rate 5.0000 | end of split 62 /113 | epoch 1 | time: 233.17s | valid loss 7.1210 | valid ppl 1237.6443 | learning rate 5.0000 | end of split 63 /113 | epoch 1 | time: 234.66s | valid loss 7.0762 | valid ppl 1183.4137 | learning rate 5.0000 | end of split 64 /113 | epoch 1 | time: 232.58s | valid loss 7.1240 | valid ppl 1241.4370 | learning rate 5.0000 | end of split 65 /113 | epoch 1 | time: 231.51s | valid loss 7.0930 | valid ppl 1203.5000 | learning rate 5.0000 | end of split 66 /113 | epoch 1 | time: 232.26s | valid loss 7.1001 | valid ppl 1212.0637 | learning rate 5.0000 | end of split 67 /113 | epoch 1 | time: 228.92s | valid loss 7.0738 | valid ppl 1180.6015 | learning rate 5.0000 | end of split 68 /113 | epoch 1 | time: 230.60s | valid loss 7.1206 | valid ppl 1237.2528 | learning rate 5.0000 | end of split 69 /113 | epoch 1 | time: 232.29s | valid loss 7.1268 | valid ppl 1244.8903 | learning rate 5.0000 | end of split 70 /113 | epoch 1 | time: 234.60s | valid loss 7.1138 | valid ppl 1228.8092 | learning rate 5.0000 | end of split 71 /113 | epoch 1 | time: 231.33s | valid loss 7.0736 | valid ppl 1180.4231 | learning rate 5.0000 | end of split 72 /113 | epoch 1 | time: 235.50s | valid loss 7.0407 | valid ppl 1142.1916 | learning rate 5.0000 | end of split 73 /113 | epoch 1 | time: 230.23s | valid loss 7.0512 | valid ppl 1154.2604 | learning rate 5.0000 | end of split 74 /113 | epoch 1 | time: 239.00s | valid loss 7.1215 | valid ppl 1238.2501 | learning rate 5.0000 | end of split 75 /113 | epoch 1 | time: 234.03s | valid loss 7.1852 | valid ppl 1319.7906 | learning rate 5.0000 | end of split 76 /113 | epoch 1 | time: 234.28s | valid loss 7.0916 | valid ppl 1201.8453 | learning rate 5.0000 | end of split 77 /113 | epoch 1 | time: 235.71s | valid loss 7.0874 | valid ppl 1196.7356 | learning rate 5.0000 | end of split 78 /113 | epoch 1 | time: 237.06s | valid loss 7.1335 | valid ppl 1253.2911 | learning rate 5.0000 | end of split 79 /113 | epoch 1 | time: 233.74s | valid loss 7.1122 | valid ppl 1226.8927 | learning rate 5.0000 | end of split 80 /113 | epoch 1 | time: 233.17s | valid loss 7.1309 | valid ppl 1250.0614 | learning rate 5.0000 | end of split 81 /113 | epoch 1 | time: 232.30s | valid loss 7.0873 | valid ppl 1196.7297 | learning rate 5.0000 | end of split 82 /113 | epoch 1 | time: 231.22s | valid loss 7.1370 | valid ppl 1257.6055 | learning rate 5.0000 | end of split 83 /113 | epoch 1 | time: 231.43s | valid loss 7.0576 | valid ppl 1161.6918 | learning rate 5.0000 | end of split 84 /113 | epoch 1 | time: 235.02s | valid loss 7.0657 | valid ppl 1171.0550 | learning rate 5.0000 | end of split 85 /113 | epoch 1 | time: 234.79s | valid loss 7.1117 | valid ppl 1226.2184 | learning rate 5.0000 | end of split 86 /113 | epoch 1 | time: 239.30s | valid loss 7.0911 | valid ppl 1201.2320 | learning rate 5.0000 | end of split 87 /113 | epoch 1 | time: 230.62s | valid loss 7.0994 | valid ppl 1211.2212 | learning rate 5.0000 | end of split 88 /113 | epoch 1 | time: 231.93s | valid loss 7.1275 | valid ppl 1245.7974 | learning rate 5.0000 | end of split 89 /113 | epoch 1 | time: 231.13s | valid loss 7.0923 | valid ppl 1202.6127 | learning rate 5.0000 | end of split 90 /113 | epoch 1 | time: 236.74s | valid loss 7.1520 | valid ppl 1276.6935 | learning rate 5.0000 | end of split 91 /113 | epoch 1 | time: 232.98s | valid loss 7.1159 | valid ppl 1231.3526 | learning rate 5.0000 | end of split 92 /113 | epoch 1 | time: 236.25s | valid loss 7.1405 | valid ppl 1262.0972 | learning rate 5.0000 | end of split 93 /113 | epoch 1 | time: 234.62s | valid loss 7.0885 | valid ppl 1198.1424 | learning rate 5.0000 | end of split 94 /113 | epoch 1 | time: 233.59s | valid loss 7.1003 | valid ppl 1212.3560 | learning rate 5.0000 | end of split 95 /113 | epoch 1 | time: 233.27s | valid loss 7.1059 | valid ppl 1219.0888 | learning rate 5.0000 | end of split 96 /113 | epoch 1 | time: 231.78s | valid loss 7.1232 | valid ppl 1240.4668 | learning rate 5.0000 | end of split 97 /113 | epoch 1 | time: 235.60s | valid loss 7.1186 | valid ppl 1234.7345 | learning rate 5.0000 | end of split 98 /113 | epoch 1 | time: 233.88s | valid loss 7.1161 | valid ppl 1231.6487 | learning rate 5.0000 | end of split 99 /113 | epoch 1 | time: 236.68s | valid loss 7.1076 | valid ppl 1221.1639 | learning rate 5.0000 | end of split 100 /113 | epoch 1 | time: 232.62s | valid loss 7.0984 | valid ppl 1210.0832 | learning rate 5.0000 | end of split 101 /113 | epoch 1 | time: 233.49s | valid loss 7.1288 | valid ppl 1247.4030 | learning rate 5.0000 | end of split 102 /113 | epoch 1 | time: 232.34s | valid loss 7.0934 | valid ppl 1204.0527 | learning rate 5.0000 | end of split 103 /113 | epoch 1 | time: 230.64s | valid loss 7.1062 | valid ppl 1219.4642 | learning rate 5.0000 | end of split 104 /113 | epoch 1 | time: 235.83s | valid loss 7.1531 | valid ppl 1278.0091 | learning rate 5.0000 | end of split 105 /113 | epoch 1 | time: 230.35s | valid loss 7.1200 | valid ppl 1236.4884 | learning rate 5.0000 | end of split 106 /113 | epoch 1 | time: 231.68s | valid loss 7.1236 | valid ppl 1240.9623 | learning rate 5.0000 | end of split 107 /113 | epoch 1 | time: 236.04s | valid loss 7.0998 | valid ppl 1211.7024 | learning rate 5.0000 | end of split 108 /113 | epoch 1 | time: 231.16s | valid loss 7.1267 | valid ppl 1244.7170 | learning rate 5.0000 | end of split 109 /113 | epoch 1 | time: 235.80s | valid loss 7.1114 | valid ppl 1225.8615 | learning rate 5.0000 | end of split 110 /113 | epoch 1 | time: 229.11s | valid loss 7.0848 | valid ppl 1193.6844 | learning rate 5.0000 | end of split 111 /113 | epoch 1 | time: 232.32s | valid loss 7.0782 | valid ppl 1185.7957 | learning rate 1.2500 | end of split 112 /113 | epoch 1 | time: 232.60s | valid loss 7.0965 | valid ppl 1207.7586 | learning rate 1.2500 | end of split 113 /113 | epoch 1 | time: 237.25s | valid loss 7.1007 | valid ppl 1212.7755 | learning rate 1.2500 | end of split 1 /113 | epoch 2 | time: 229.76s | valid loss 7.0779 | valid ppl 1185.4298 | learning rate 1.2500 | end of split 2 /113 | epoch 2 | time: 232.20s | valid loss 7.0994 | valid ppl 1211.1846 | learning rate 1.2500 | end of split 3 /113 | epoch 2 | time: 230.39s | valid loss 7.0802 | valid ppl 1188.2092 | learning rate 1.2500 | end of split 4 /113 | epoch 2 | time: 232.46s | valid loss 7.0951 | valid ppl 1205.9962 | learning rate 1.2500 | end of split 5 /113 | epoch 2 | time: 232.66s | valid loss 7.1047 | valid ppl 1217.6557 | learning rate 1.2500 | end of split 6 /113 | epoch 2 | time: 231.54s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500 | end of split 7 /113 | epoch 2 | time: 234.75s | valid loss 7.1142 | valid ppl 1229.3492 | learning rate 1.2500 | end of split 8 /113 | epoch 2 | time: 235.30s | valid loss 7.0901 | valid ppl 1200.0375 | learning rate 1.2500 | end of split 9 /113 | epoch 2 | time: 235.81s | valid loss 7.0971 | valid ppl 1208.4907 | learning rate 1.2500 | end of split 10 /113 | epoch 2 | time: 230.40s | valid loss 7.0927 | valid ppl 1203.1642 | learning rate 1.2500 | end of split 11 /113 | epoch 2 | time: 235.86s | valid loss 7.1028 | valid ppl 1215.3789 | learning rate 1.2500 | end of split 12 /113 | epoch 2 | time: 230.91s | valid loss 7.0949 | valid ppl 1205.7953 | learning rate 1.2500 | end of split 13 /113 | epoch 2 | time: 233.88s | valid loss 7.0789 | valid ppl 1186.6439 | learning rate 1.2500 | end of split 14 /113 | epoch 2 | time: 232.71s | valid loss 7.0946 | valid ppl 1205.4994 | learning rate 1.2500 | end of split 15 /113 | epoch 2 | time: 230.99s | valid loss 7.0850 | valid ppl 1193.9639 | learning rate 1.2500 | end of split 16 /113 | epoch 2 | time: 227.77s | valid loss 7.1121 | valid ppl 1226.6969 | learning rate 1.2500 | end of split 17 /113 | epoch 2 | time: 235.85s | valid loss 7.0980 | valid ppl 1209.5941 | learning rate 1.2500 | end of split 18 /113 | epoch 2 | time: 235.06s | valid loss 7.0815 | valid ppl 1189.7783 | learning rate 1.2500 | end of split 19 /113 | epoch 2 | time: 237.29s | valid loss 7.1028 | valid ppl 1215.3490 | learning rate 1.2500 | end of split 20 /113 | epoch 2 | time: 235.29s | valid loss 7.0942 | valid ppl 1204.9817 | learning rate 1.2500 | end of split 21 /113 | epoch 2 | time: 231.22s | valid loss 7.0837 | valid ppl 1192.3273 | learning rate 1.2500 | end of split 22 /113 | epoch 2 | time: 235.58s | valid loss 7.0989 | valid ppl 1210.6321 | learning rate 1.2500 | end of split 23 /113 | epoch 2 | time: 232.62s | valid loss 7.0947 | valid ppl 1205.5749 | learning rate 1.2500 | end of split 24 /113 | epoch 2 | time: 238.49s | valid loss 7.1007 | valid ppl 1212.8266 | learning rate 1.2500 | end of split 25 /113 | epoch 2 | time: 228.89s | valid loss 7.0794 | valid ppl 1187.2814 | learning rate 1.2500 | end of split 26 /113 | epoch 2 | time: 231.21s | valid loss 7.0910 | valid ppl 1201.0850 | learning rate 1.2500 | end of split 27 /113 | epoch 2 | time: 236.23s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500 | end of split 28 /113 | epoch 2 | time: 234.70s | valid loss 7.0858 | valid ppl 1194.8918 | learning rate 1.2500 | end of split 29 /113 | epoch 2 | time: 229.67s | valid loss 7.0637 | valid ppl 1168.7198 | learning rate 1.2500 | end of split 30 /113 | epoch 2 | time: 230.59s | valid loss 7.1101 | valid ppl 1224.2250 | learning rate 1.2500 | end of split 31 /113 | epoch 2 | time: 232.68s | valid loss 7.0836 | valid ppl 1192.2460 | learning rate 1.2500 | end of split 32 /113 | epoch 2 | time: 231.80s | valid loss 7.1094 | valid ppl 1223.3879 | learning rate 1.2500 | end of split 33 /113 | epoch 2 | time: 234.73s | valid loss 7.1026 | valid ppl 1215.0679 | learning rate 1.2500 | end of split 34 /113 | epoch 2 | time: 232.94s | valid loss 7.0845 | valid ppl 1193.3580 | learning rate 1.2500 | end of split 35 /113 | epoch 2 | time: 232.85s | valid loss 7.1046 | valid ppl 1217.5067 | learning rate 1.2500 | end of split 36 /113 | epoch 2 | time: 236.10s | valid loss 7.1064 | valid ppl 1219.7146 | learning rate 1.2500 | end of split 37 /113 | epoch 2 | time: 234.89s | valid loss 7.0999 | valid ppl 1211.8541 | learning rate 1.2500 | end of split 38 /113 | epoch 2 | time: 239.33s | valid loss 7.0895 | valid ppl 1199.2961 | learning rate 1.2500 | end of split 39 /113 | epoch 2 | time: 239.01s | valid loss 7.1112 | valid ppl 1225.6211 | learning rate 1.2500 | end of split 40 /113 | epoch 2 | time: 233.50s | valid loss 7.0895 | valid ppl 1199.3484 | learning rate 1.2500 | end of split 41 /113 | epoch 2 | time: 237.27s | valid loss 7.0723 | valid ppl 1178.8008 | learning rate 1.2500 | end of split 42 /113 | epoch 2 | time: 231.15s | valid loss 7.0958 | valid ppl 1206.8495 | learning rate 1.2500 | end of split 43 /113 | epoch 2 | time: 231.39s | valid loss 7.0922 | valid ppl 1202.5908 | learning rate 1.2500 | end of split 44 /113 | epoch 2 | time: 229.96s | valid loss 7.1024 | valid ppl 1214.8449 | learning rate 1.2500 | end of split 45 /113 | epoch 2 | time: 237.25s | valid loss 7.1115 | valid ppl 1226.0123 | learning rate 1.2500 | end of split 46 /113 | epoch 2 | time: 233.19s | valid loss 7.0828 | valid ppl 1191.2430 | learning rate 1.2500 | end of split 47 /113 | epoch 2 | time: 232.26s | valid loss 7.0917 | valid ppl 1201.9762 | learning rate 1.2500 | end of split 48 /113 | epoch 2 | time: 227.95s | valid loss 7.0983 | valid ppl 1209.8765 | learning rate 1.2500 | end of split 49 /113 | epoch 2 | time: 232.30s | valid loss 7.0888 | valid ppl 1198.4128 | learning rate 0.3125 | end of split 50 /113 | epoch 2 | time: 238.16s | valid loss 7.0910 | valid ppl 1201.0504 | learning rate 0.3125 | end of split 51 /113 | epoch 2 | time: 233.23s | valid loss 7.0949 | valid ppl 1205.7495 | learning rate 0.3125 | end of split 52 /113 | epoch 2 | time: 232.61s | valid loss 7.0807 | valid ppl 1188.8117 | learning rate 0.3125 | end of split 53 /113 | epoch 2 | time: 233.73s | valid loss 7.0902 | valid ppl 1200.1734 | learning rate 0.3125 | end of split 54 /113 | epoch 2 | time: 230.67s | valid loss 7.0855 | valid ppl 1194.5399 | learning rate 0.3125 | end of split 55 /113 | epoch 2 | time: 235.17s | valid loss 7.0903 | valid ppl 1200.2645 | learning rate 0.3125 | end of split 56 /113 | epoch 2 | time: 230.04s | valid loss 7.0905 | valid ppl 1200.5506 | learning rate 0.3125 | end of split 57 /113 | epoch 2 | time: 235.80s | valid loss 7.0972 | valid ppl 1208.5664 | learning rate 0.3125 | end of split 58 /113 | epoch 2 | time: 233.83s | valid loss 7.0926 | valid ppl 1203.0872 | learning rate 0.3125 | end of split 59 /113 | epoch 2 | time: 234.66s | valid loss 7.0922 | valid ppl 1202.5223 | learning rate 0.3125 | end of split 60 /113 | epoch 2 | time: 231.74s | valid loss 7.0899 | valid ppl 1199.8190 | learning rate 0.3125 | end of split 61 /113 | epoch 2 | time: 228.91s | valid loss 7.0938 | valid ppl 1204.4743 | learning rate 0.3125 | end of split 62 /113 | epoch 2 | time: 235.87s | valid loss 7.0887 | valid ppl 1198.3909 | learning rate 0.3125 | end of split 63 /113 | epoch 2 | time: 234.42s | valid loss 7.0820 | valid ppl 1190.2886 | learning rate 0.3125 | end of split 64 /113 | epoch 2 | time: 233.77s | valid loss 7.0910 | valid ppl 1201.1087 | learning rate 0.3125 | end of split 65 /113 | epoch 2 | time: 235.55s | valid loss 7.0922 | valid ppl 1202.4961 | learning rate 0.3125 | end of split 66 /113 | epoch 2 | time: 231.77s | valid loss 7.0890 | valid ppl 1198.6597 | learning rate 0.3125 | end of split 67 /113 | epoch 2 | time: 239.03s | valid loss 7.0907 | valid ppl 1200.6899 | learning rate 0.3125 | end of split 68 /113 | epoch 2 | time: 233.79s | valid loss 7.0929 | valid ppl 1203.3503 | learning rate 0.3125 | end of split 69 /113 | epoch 2 | time: 230.34s | valid loss 7.0980 | valid ppl 1209.6052 | learning rate 0.3125 | end of split 70 /113 | epoch 2 | time: 236.49s | valid loss 7.0882 | valid ppl 1197.7819 | learning rate 0.3125 | end of split 71 /113 | epoch 2 | time: 234.44s | valid loss 7.1003 | valid ppl 1212.3714 | learning rate 0.3125 | end of split 72 /113 | epoch 2 | time: 233.01s | valid loss 7.0828 | valid ppl 1191.3159 | learning rate 0.3125 | end of split 73 /113 | epoch 2 | time: 238.78s | valid loss 7.0959 | valid ppl 1207.0328 | learning rate 0.3125 | end of split 74 /113 | epoch 2 | time: 239.67s | valid loss 7.0914 | valid ppl 1201.5850 | learning rate 0.3125 | end of split 75 /113 | epoch 2 | time: 230.83s | valid loss 7.1005 | valid ppl 1212.5495 | learning rate 0.3125 | end of split 76 /113 | epoch 2 | time: 235.05s | valid loss 7.0889 | valid ppl 1198.6319 | learning rate 0.3125 | end of split 77 /113 | epoch 2 | time: 230.27s | valid loss 7.0923 | valid ppl 1202.6914 | learning rate 0.3125 | end of split 78 /113 | epoch 2 | time: 231.51s | valid loss 7.0787 | valid ppl 1186.4144 | learning rate 0.3125 | end of split 79 /113 | epoch 2 | time: 232.70s | valid loss 7.0995 | valid ppl 1211.3830 | learning rate 0.3125 | end of split 80 /113 | epoch 2 | time: 233.21s | valid loss 7.0929 | valid ppl 1203.3740 | learning rate 0.3125 | end of split 81 /113 | epoch 2 | time: 230.05s | valid loss 7.0802 | valid ppl 1188.1591 | learning rate 0.3125 | end of split 82 /113 | epoch 2 | time: 235.62s | valid loss 7.0860 | valid ppl 1195.0842 | learning rate 0.3125 | end of split 83 /113 | epoch 2 | time: 236.11s | valid loss 7.0906 | valid ppl 1200.6764 | learning rate 0.3125 | end of split 84 /113 | epoch 2 | time: 230.87s | valid loss 7.0850 | valid ppl 1193.9009 | learning rate 0.3125 | end of split 85 /113 | epoch 2 | time: 232.62s | valid loss 7.0939 | valid ppl 1204.6437 | learning rate 0.3125 | end of split 86 /113 | epoch 2 | time: 238.23s | valid loss 7.0856 | valid ppl 1194.6482 | learning rate 0.3125 | end of split 87 /113 | epoch 2 | time: 233.77s | valid loss 7.0942 | valid ppl 1205.0113 | learning rate 0.3125 | end of split 88 /113 | epoch 2 | time: 230.52s | valid loss 7.0954 | valid ppl 1206.3736 | learning rate 0.3125 | end of split 89 /113 | epoch 2 | time: 235.21s | valid loss 7.0953 | valid ppl 1206.2616 | learning rate 0.3125 | end of split 90 /113 | epoch 2 | time: 236.74s | valid loss 7.0902 | valid ppl 1200.1371 | learning rate 0.3125 | end of split 91 /113 | epoch 2 | time: 234.19s | valid loss 7.0940 | valid ppl 1204.7284 | learning rate 0.3125 | end of split 92 /113 | epoch 2 | time: 229.17s | valid loss 7.0667 | valid ppl 1172.2181 | learning rate 0.3125 | end of split 93 /113 | epoch 2 | time: 233.18s | valid loss 7.0851 | valid ppl 1193.9966 | learning rate 0.3125 | end of split 94 /113 | epoch 2 | time: 233.54s | valid loss 7.0983 | valid ppl 1209.8629 | learning rate 0.3125 | end of split 95 /113 | epoch 2 | time: 240.46s | valid loss 7.0915 | valid ppl 1201.7565 | learning rate 0.3125 | end of split 96 /113 | epoch 2 | time: 232.63s | valid loss 7.0925 | valid ppl 1202.8766 | learning rate 0.3125 | end of split 97 /113 | epoch 2 | time: 236.79s | valid loss 7.0868 | valid ppl 1196.0248 | learning rate 0.3125 | end of split 98 /113 | epoch 2 | time: 234.71s | valid loss 7.0826 | valid ppl 1191.0655 | learning rate 0.3125 | end of split 99 /113 | epoch 2 | time: 233.29s | valid loss 7.0957 | valid ppl 1206.8113 | learning rate 0.3125 | end of split 100 /113 | epoch 2 | time: 236.83s | valid loss 7.0924 | valid ppl 1202.8005 | learning rate 0.0781 | end of split 101 /113 | epoch 2 | time: 48.85s | valid loss 7.0897 | valid ppl 1199.5980 | learning rate 0.0781 | end of split 102 /113 | epoch 2 | time: 236.70s | valid loss 7.0890 | valid ppl 1198.7280 | learning rate 0.0781 | end of split 103 /113 | epoch 2 | time: 238.79s | valid loss 7.0864 | valid ppl 1195.5683 | learning rate 0.0781 | end of split 104 /113 | epoch 2 | time: 232.38s | valid loss 7.0929 | valid ppl 1203.4357 | learning rate 0.0781 | end of split 105 /113 | epoch 2 | time: 229.19s | valid loss 7.0942 | valid ppl 1204.8987 | learning rate 0.0781 | end of split 106 /113 | epoch 2 | time: 231.16s | valid loss 7.0949 | valid ppl 1205.8207 | learning rate 0.0781 | end of split 107 /113 | epoch 2 | time: 232.93s | valid loss 7.0896 | valid ppl 1199.3762 | learning rate 0.0781 | end of split 108 /113 | epoch 2 | time: 234.06s | valid loss 7.0961 | valid ppl 1207.2101 | learning rate 0.0781 | end of split 109 /113 | epoch 2 | time: 233.27s | valid loss 7.0883 | valid ppl 1197.8653 | learning rate 0.0781 | end of split 110 /113 | epoch 2 | time: 234.69s | valid loss 7.0930 | valid ppl 1203.4772 | learning rate 0.0781 | end of split 111 /113 | epoch 2 | time: 231.50s | valid loss 7.0946 | valid ppl 1205.4435 | learning rate 0.0781 | end of split 112 /113 | epoch 2 | time: 233.79s | valid loss 7.0864 | valid ppl 1195.5549 | learning rate 0.0781 | end of split 113 /113 | epoch 2 | time: 232.14s | valid loss 7.0906 | valid ppl 1200.6055 | learning rate 0.0781 TEST: valid loss 7.0908 | valid ppl 1200.8965 |