model: add training log
Browse files- loss.txt +224 -0
- training.log +0 -0
loss.txt
ADDED
@@ -0,0 +1,224 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| end of split 1 /113 | epoch 1 | time: 224.45s | valid loss 7.6183 | valid ppl 2035.0861 | learning rate 5.0000
|
2 |
+
| end of split 2 /113 | epoch 1 | time: 229.45s | valid loss 7.3864 | valid ppl 1613.9065 | learning rate 5.0000
|
3 |
+
| end of split 3 /113 | epoch 1 | time: 239.40s | valid loss 7.3424 | valid ppl 1544.3504 | learning rate 5.0000
|
4 |
+
| end of split 4 /113 | epoch 1 | time: 233.67s | valid loss 7.2568 | valid ppl 1417.6838 | learning rate 5.0000
|
5 |
+
| end of split 5 /113 | epoch 1 | time: 227.57s | valid loss 7.2848 | valid ppl 1458.0133 | learning rate 5.0000
|
6 |
+
| end of split 6 /113 | epoch 1 | time: 235.49s | valid loss 7.2458 | valid ppl 1402.2080 | learning rate 5.0000
|
7 |
+
| end of split 7 /113 | epoch 1 | time: 235.14s | valid loss 7.2137 | valid ppl 1357.8841 | learning rate 5.0000
|
8 |
+
| end of split 8 /113 | epoch 1 | time: 238.90s | valid loss 7.1989 | valid ppl 1337.9002 | learning rate 5.0000
|
9 |
+
| end of split 9 /113 | epoch 1 | time: 228.81s | valid loss 7.1782 | valid ppl 1310.5202 | learning rate 5.0000
|
10 |
+
| end of split 10 /113 | epoch 1 | time: 230.95s | valid loss 7.1692 | valid ppl 1298.8697 | learning rate 5.0000
|
11 |
+
| end of split 11 /113 | epoch 1 | time: 231.70s | valid loss 7.1442 | valid ppl 1266.7305 | learning rate 5.0000
|
12 |
+
| end of split 12 /113 | epoch 1 | time: 240.42s | valid loss 7.1839 | valid ppl 1317.9954 | learning rate 5.0000
|
13 |
+
| end of split 13 /113 | epoch 1 | time: 235.25s | valid loss 7.2127 | valid ppl 1356.5282 | learning rate 5.0000
|
14 |
+
| end of split 14 /113 | epoch 1 | time: 232.67s | valid loss 7.2704 | valid ppl 1437.1488 | learning rate 5.0000
|
15 |
+
| end of split 15 /113 | epoch 1 | time: 229.99s | valid loss 7.1410 | valid ppl 1262.7434 | learning rate 5.0000
|
16 |
+
| end of split 16 /113 | epoch 1 | time: 230.24s | valid loss 7.2028 | valid ppl 1343.1933 | learning rate 5.0000
|
17 |
+
| end of split 17 /113 | epoch 1 | time: 48.80s | valid loss 7.1864 | valid ppl 1321.2975 | learning rate 5.0000
|
18 |
+
| end of split 18 /113 | epoch 1 | time: 238.71s | valid loss 7.1344 | valid ppl 1254.4124 | learning rate 5.0000
|
19 |
+
| end of split 19 /113 | epoch 1 | time: 238.74s | valid loss 7.1402 | valid ppl 1261.6803 | learning rate 5.0000
|
20 |
+
| end of split 20 /113 | epoch 1 | time: 230.88s | valid loss 7.2222 | valid ppl 1369.5573 | learning rate 5.0000
|
21 |
+
| end of split 21 /113 | epoch 1 | time: 235.01s | valid loss 7.1024 | valid ppl 1214.8458 | learning rate 5.0000
|
22 |
+
| end of split 22 /113 | epoch 1 | time: 233.22s | valid loss 7.1523 | valid ppl 1277.0068 | learning rate 5.0000
|
23 |
+
| end of split 23 /113 | epoch 1 | time: 234.10s | valid loss 7.1516 | valid ppl 1276.1012 | learning rate 5.0000
|
24 |
+
| end of split 24 /113 | epoch 1 | time: 234.94s | valid loss 7.1347 | valid ppl 1254.7220 | learning rate 5.0000
|
25 |
+
| end of split 25 /113 | epoch 1 | time: 232.93s | valid loss 7.1199 | valid ppl 1236.2833 | learning rate 5.0000
|
26 |
+
| end of split 26 /113 | epoch 1 | time: 234.40s | valid loss 7.1184 | valid ppl 1234.5018 | learning rate 5.0000
|
27 |
+
| end of split 27 /113 | epoch 1 | time: 237.28s | valid loss 7.1083 | valid ppl 1222.0958 | learning rate 5.0000
|
28 |
+
| end of split 28 /113 | epoch 1 | time: 231.57s | valid loss 7.1589 | valid ppl 1285.4715 | learning rate 5.0000
|
29 |
+
| end of split 29 /113 | epoch 1 | time: 232.64s | valid loss 7.1232 | valid ppl 1240.4354 | learning rate 5.0000
|
30 |
+
| end of split 30 /113 | epoch 1 | time: 238.52s | valid loss 7.0960 | valid ppl 1207.1889 | learning rate 5.0000
|
31 |
+
| end of split 31 /113 | epoch 1 | time: 235.86s | valid loss 7.1294 | valid ppl 1248.0873 | learning rate 5.0000
|
32 |
+
| end of split 32 /113 | epoch 1 | time: 234.67s | valid loss 7.1366 | valid ppl 1257.1105 | learning rate 5.0000
|
33 |
+
| end of split 33 /113 | epoch 1 | time: 236.46s | valid loss 7.0806 | valid ppl 1188.6487 | learning rate 5.0000
|
34 |
+
| end of split 34 /113 | epoch 1 | time: 231.14s | valid loss 7.1160 | valid ppl 1231.4851 | learning rate 5.0000
|
35 |
+
| end of split 35 /113 | epoch 1 | time: 236.11s | valid loss 7.1426 | valid ppl 1264.6883 | learning rate 5.0000
|
36 |
+
| end of split 36 /113 | epoch 1 | time: 232.98s | valid loss 7.1442 | valid ppl 1266.7118 | learning rate 5.0000
|
37 |
+
| end of split 37 /113 | epoch 1 | time: 235.77s | valid loss 7.1382 | valid ppl 1259.1016 | learning rate 5.0000
|
38 |
+
| end of split 38 /113 | epoch 1 | time: 235.38s | valid loss 7.0742 | valid ppl 1181.0755 | learning rate 5.0000
|
39 |
+
| end of split 39 /113 | epoch 1 | time: 230.26s | valid loss 7.1081 | valid ppl 1221.7934 | learning rate 5.0000
|
40 |
+
| end of split 40 /113 | epoch 1 | time: 233.25s | valid loss 7.0893 | valid ppl 1199.0533 | learning rate 5.0000
|
41 |
+
| end of split 41 /113 | epoch 1 | time: 232.96s | valid loss 7.0886 | valid ppl 1198.2460 | learning rate 5.0000
|
42 |
+
| end of split 42 /113 | epoch 1 | time: 233.86s | valid loss 7.1457 | valid ppl 1268.6031 | learning rate 5.0000
|
43 |
+
| end of split 43 /113 | epoch 1 | time: 234.62s | valid loss 7.1386 | valid ppl 1259.6532 | learning rate 5.0000
|
44 |
+
| end of split 44 /113 | epoch 1 | time: 232.69s | valid loss 7.0900 | valid ppl 1199.9118 | learning rate 5.0000
|
45 |
+
| end of split 45 /113 | epoch 1 | time: 230.84s | valid loss 7.1523 | valid ppl 1276.9780 | learning rate 5.0000
|
46 |
+
| end of split 46 /113 | epoch 1 | time: 231.71s | valid loss 7.1219 | valid ppl 1238.7760 | learning rate 5.0000
|
47 |
+
| end of split 47 /113 | epoch 1 | time: 230.86s | valid loss 7.0811 | valid ppl 1189.2806 | learning rate 5.0000
|
48 |
+
| end of split 48 /113 | epoch 1 | time: 232.63s | valid loss 7.1543 | valid ppl 1279.6527 | learning rate 5.0000
|
49 |
+
| end of split 49 /113 | epoch 1 | time: 233.86s | valid loss 7.0683 | valid ppl 1174.0986 | learning rate 5.0000
|
50 |
+
| end of split 50 /113 | epoch 1 | time: 229.15s | valid loss 7.0550 | valid ppl 1158.6403 | learning rate 5.0000
|
51 |
+
| end of split 51 /113 | epoch 1 | time: 236.63s | valid loss 7.1117 | valid ppl 1226.2546 | learning rate 5.0000
|
52 |
+
| end of split 52 /113 | epoch 1 | time: 238.10s | valid loss 7.1026 | valid ppl 1215.1584 | learning rate 5.0000
|
53 |
+
| end of split 53 /113 | epoch 1 | time: 232.74s | valid loss 7.0969 | valid ppl 1208.2648 | learning rate 5.0000
|
54 |
+
| end of split 54 /113 | epoch 1 | time: 238.09s | valid loss 7.0846 | valid ppl 1193.4612 | learning rate 5.0000
|
55 |
+
| end of split 55 /113 | epoch 1 | time: 233.70s | valid loss 7.1157 | valid ppl 1231.1284 | learning rate 5.0000
|
56 |
+
| end of split 56 /113 | epoch 1 | time: 230.09s | valid loss 7.0540 | valid ppl 1157.4801 | learning rate 5.0000
|
57 |
+
| end of split 57 /113 | epoch 1 | time: 235.27s | valid loss 7.0783 | valid ppl 1185.9658 | learning rate 5.0000
|
58 |
+
| end of split 58 /113 | epoch 1 | time: 233.74s | valid loss 7.1189 | valid ppl 1235.0774 | learning rate 5.0000
|
59 |
+
| end of split 59 /113 | epoch 1 | time: 229.77s | valid loss 7.0364 | valid ppl 1137.2668 | learning rate 5.0000
|
60 |
+
| end of split 60 /113 | epoch 1 | time: 233.24s | valid loss 7.0514 | valid ppl 1154.5030 | learning rate 5.0000
|
61 |
+
| end of split 61 /113 | epoch 1 | time: 236.63s | valid loss 7.1055 | valid ppl 1218.6020 | learning rate 5.0000
|
62 |
+
| end of split 62 /113 | epoch 1 | time: 233.17s | valid loss 7.1210 | valid ppl 1237.6443 | learning rate 5.0000
|
63 |
+
| end of split 63 /113 | epoch 1 | time: 234.66s | valid loss 7.0762 | valid ppl 1183.4137 | learning rate 5.0000
|
64 |
+
| end of split 64 /113 | epoch 1 | time: 232.58s | valid loss 7.1240 | valid ppl 1241.4370 | learning rate 5.0000
|
65 |
+
| end of split 65 /113 | epoch 1 | time: 231.51s | valid loss 7.0930 | valid ppl 1203.5000 | learning rate 5.0000
|
66 |
+
| end of split 66 /113 | epoch 1 | time: 232.26s | valid loss 7.1001 | valid ppl 1212.0637 | learning rate 5.0000
|
67 |
+
| end of split 67 /113 | epoch 1 | time: 228.92s | valid loss 7.0738 | valid ppl 1180.6015 | learning rate 5.0000
|
68 |
+
| end of split 68 /113 | epoch 1 | time: 230.60s | valid loss 7.1206 | valid ppl 1237.2528 | learning rate 5.0000
|
69 |
+
| end of split 69 /113 | epoch 1 | time: 232.29s | valid loss 7.1268 | valid ppl 1244.8903 | learning rate 5.0000
|
70 |
+
| end of split 70 /113 | epoch 1 | time: 234.60s | valid loss 7.1138 | valid ppl 1228.8092 | learning rate 5.0000
|
71 |
+
| end of split 71 /113 | epoch 1 | time: 231.33s | valid loss 7.0736 | valid ppl 1180.4231 | learning rate 5.0000
|
72 |
+
| end of split 72 /113 | epoch 1 | time: 235.50s | valid loss 7.0407 | valid ppl 1142.1916 | learning rate 5.0000
|
73 |
+
| end of split 73 /113 | epoch 1 | time: 230.23s | valid loss 7.0512 | valid ppl 1154.2604 | learning rate 5.0000
|
74 |
+
| end of split 74 /113 | epoch 1 | time: 239.00s | valid loss 7.1215 | valid ppl 1238.2501 | learning rate 5.0000
|
75 |
+
| end of split 75 /113 | epoch 1 | time: 234.03s | valid loss 7.1852 | valid ppl 1319.7906 | learning rate 5.0000
|
76 |
+
| end of split 76 /113 | epoch 1 | time: 234.28s | valid loss 7.0916 | valid ppl 1201.8453 | learning rate 5.0000
|
77 |
+
| end of split 77 /113 | epoch 1 | time: 235.71s | valid loss 7.0874 | valid ppl 1196.7356 | learning rate 5.0000
|
78 |
+
| end of split 78 /113 | epoch 1 | time: 237.06s | valid loss 7.1335 | valid ppl 1253.2911 | learning rate 5.0000
|
79 |
+
| end of split 79 /113 | epoch 1 | time: 233.74s | valid loss 7.1122 | valid ppl 1226.8927 | learning rate 5.0000
|
80 |
+
| end of split 80 /113 | epoch 1 | time: 233.17s | valid loss 7.1309 | valid ppl 1250.0614 | learning rate 5.0000
|
81 |
+
| end of split 81 /113 | epoch 1 | time: 232.30s | valid loss 7.0873 | valid ppl 1196.7297 | learning rate 5.0000
|
82 |
+
| end of split 82 /113 | epoch 1 | time: 231.22s | valid loss 7.1370 | valid ppl 1257.6055 | learning rate 5.0000
|
83 |
+
| end of split 83 /113 | epoch 1 | time: 231.43s | valid loss 7.0576 | valid ppl 1161.6918 | learning rate 5.0000
|
84 |
+
| end of split 84 /113 | epoch 1 | time: 235.02s | valid loss 7.0657 | valid ppl 1171.0550 | learning rate 5.0000
|
85 |
+
| end of split 85 /113 | epoch 1 | time: 234.79s | valid loss 7.1117 | valid ppl 1226.2184 | learning rate 5.0000
|
86 |
+
| end of split 86 /113 | epoch 1 | time: 239.30s | valid loss 7.0911 | valid ppl 1201.2320 | learning rate 5.0000
|
87 |
+
| end of split 87 /113 | epoch 1 | time: 230.62s | valid loss 7.0994 | valid ppl 1211.2212 | learning rate 5.0000
|
88 |
+
| end of split 88 /113 | epoch 1 | time: 231.93s | valid loss 7.1275 | valid ppl 1245.7974 | learning rate 5.0000
|
89 |
+
| end of split 89 /113 | epoch 1 | time: 231.13s | valid loss 7.0923 | valid ppl 1202.6127 | learning rate 5.0000
|
90 |
+
| end of split 90 /113 | epoch 1 | time: 236.74s | valid loss 7.1520 | valid ppl 1276.6935 | learning rate 5.0000
|
91 |
+
| end of split 91 /113 | epoch 1 | time: 232.98s | valid loss 7.1159 | valid ppl 1231.3526 | learning rate 5.0000
|
92 |
+
| end of split 92 /113 | epoch 1 | time: 236.25s | valid loss 7.1405 | valid ppl 1262.0972 | learning rate 5.0000
|
93 |
+
| end of split 93 /113 | epoch 1 | time: 234.62s | valid loss 7.0885 | valid ppl 1198.1424 | learning rate 5.0000
|
94 |
+
| end of split 94 /113 | epoch 1 | time: 233.59s | valid loss 7.1003 | valid ppl 1212.3560 | learning rate 5.0000
|
95 |
+
| end of split 95 /113 | epoch 1 | time: 233.27s | valid loss 7.1059 | valid ppl 1219.0888 | learning rate 5.0000
|
96 |
+
| end of split 96 /113 | epoch 1 | time: 231.78s | valid loss 7.1232 | valid ppl 1240.4668 | learning rate 5.0000
|
97 |
+
| end of split 97 /113 | epoch 1 | time: 235.60s | valid loss 7.1186 | valid ppl 1234.7345 | learning rate 5.0000
|
98 |
+
| end of split 98 /113 | epoch 1 | time: 233.88s | valid loss 7.1161 | valid ppl 1231.6487 | learning rate 5.0000
|
99 |
+
| end of split 99 /113 | epoch 1 | time: 236.68s | valid loss 7.1076 | valid ppl 1221.1639 | learning rate 5.0000
|
100 |
+
| end of split 100 /113 | epoch 1 | time: 232.62s | valid loss 7.0984 | valid ppl 1210.0832 | learning rate 5.0000
|
101 |
+
| end of split 101 /113 | epoch 1 | time: 233.49s | valid loss 7.1288 | valid ppl 1247.4030 | learning rate 5.0000
|
102 |
+
| end of split 102 /113 | epoch 1 | time: 232.34s | valid loss 7.0934 | valid ppl 1204.0527 | learning rate 5.0000
|
103 |
+
| end of split 103 /113 | epoch 1 | time: 230.64s | valid loss 7.1062 | valid ppl 1219.4642 | learning rate 5.0000
|
104 |
+
| end of split 104 /113 | epoch 1 | time: 235.83s | valid loss 7.1531 | valid ppl 1278.0091 | learning rate 5.0000
|
105 |
+
| end of split 105 /113 | epoch 1 | time: 230.35s | valid loss 7.1200 | valid ppl 1236.4884 | learning rate 5.0000
|
106 |
+
| end of split 106 /113 | epoch 1 | time: 231.68s | valid loss 7.1236 | valid ppl 1240.9623 | learning rate 5.0000
|
107 |
+
| end of split 107 /113 | epoch 1 | time: 236.04s | valid loss 7.0998 | valid ppl 1211.7024 | learning rate 5.0000
|
108 |
+
| end of split 108 /113 | epoch 1 | time: 231.16s | valid loss 7.1267 | valid ppl 1244.7170 | learning rate 5.0000
|
109 |
+
| end of split 109 /113 | epoch 1 | time: 235.80s | valid loss 7.1114 | valid ppl 1225.8615 | learning rate 5.0000
|
110 |
+
| end of split 110 /113 | epoch 1 | time: 229.11s | valid loss 7.0848 | valid ppl 1193.6844 | learning rate 5.0000
|
111 |
+
| end of split 111 /113 | epoch 1 | time: 232.32s | valid loss 7.0782 | valid ppl 1185.7957 | learning rate 1.2500
|
112 |
+
| end of split 112 /113 | epoch 1 | time: 232.60s | valid loss 7.0965 | valid ppl 1207.7586 | learning rate 1.2500
|
113 |
+
| end of split 113 /113 | epoch 1 | time: 237.25s | valid loss 7.1007 | valid ppl 1212.7755 | learning rate 1.2500
|
114 |
+
| end of split 1 /113 | epoch 2 | time: 229.76s | valid loss 7.0779 | valid ppl 1185.4298 | learning rate 1.2500
|
115 |
+
| end of split 2 /113 | epoch 2 | time: 232.20s | valid loss 7.0994 | valid ppl 1211.1846 | learning rate 1.2500
|
116 |
+
| end of split 3 /113 | epoch 2 | time: 230.39s | valid loss 7.0802 | valid ppl 1188.2092 | learning rate 1.2500
|
117 |
+
| end of split 4 /113 | epoch 2 | time: 232.46s | valid loss 7.0951 | valid ppl 1205.9962 | learning rate 1.2500
|
118 |
+
| end of split 5 /113 | epoch 2 | time: 232.66s | valid loss 7.1047 | valid ppl 1217.6557 | learning rate 1.2500
|
119 |
+
| end of split 6 /113 | epoch 2 | time: 231.54s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
|
120 |
+
| end of split 7 /113 | epoch 2 | time: 234.75s | valid loss 7.1142 | valid ppl 1229.3492 | learning rate 1.2500
|
121 |
+
| end of split 8 /113 | epoch 2 | time: 235.30s | valid loss 7.0901 | valid ppl 1200.0375 | learning rate 1.2500
|
122 |
+
| end of split 9 /113 | epoch 2 | time: 235.81s | valid loss 7.0971 | valid ppl 1208.4907 | learning rate 1.2500
|
123 |
+
| end of split 10 /113 | epoch 2 | time: 230.40s | valid loss 7.0927 | valid ppl 1203.1642 | learning rate 1.2500
|
124 |
+
| end of split 11 /113 | epoch 2 | time: 235.86s | valid loss 7.1028 | valid ppl 1215.3789 | learning rate 1.2500
|
125 |
+
| end of split 12 /113 | epoch 2 | time: 230.91s | valid loss 7.0949 | valid ppl 1205.7953 | learning rate 1.2500
|
126 |
+
| end of split 13 /113 | epoch 2 | time: 233.88s | valid loss 7.0789 | valid ppl 1186.6439 | learning rate 1.2500
|
127 |
+
| end of split 14 /113 | epoch 2 | time: 232.71s | valid loss 7.0946 | valid ppl 1205.4994 | learning rate 1.2500
|
128 |
+
| end of split 15 /113 | epoch 2 | time: 230.99s | valid loss 7.0850 | valid ppl 1193.9639 | learning rate 1.2500
|
129 |
+
| end of split 16 /113 | epoch 2 | time: 227.77s | valid loss 7.1121 | valid ppl 1226.6969 | learning rate 1.2500
|
130 |
+
| end of split 17 /113 | epoch 2 | time: 235.85s | valid loss 7.0980 | valid ppl 1209.5941 | learning rate 1.2500
|
131 |
+
| end of split 18 /113 | epoch 2 | time: 235.06s | valid loss 7.0815 | valid ppl 1189.7783 | learning rate 1.2500
|
132 |
+
| end of split 19 /113 | epoch 2 | time: 237.29s | valid loss 7.1028 | valid ppl 1215.3490 | learning rate 1.2500
|
133 |
+
| end of split 20 /113 | epoch 2 | time: 235.29s | valid loss 7.0942 | valid ppl 1204.9817 | learning rate 1.2500
|
134 |
+
| end of split 21 /113 | epoch 2 | time: 231.22s | valid loss 7.0837 | valid ppl 1192.3273 | learning rate 1.2500
|
135 |
+
| end of split 22 /113 | epoch 2 | time: 235.58s | valid loss 7.0989 | valid ppl 1210.6321 | learning rate 1.2500
|
136 |
+
| end of split 23 /113 | epoch 2 | time: 232.62s | valid loss 7.0947 | valid ppl 1205.5749 | learning rate 1.2500
|
137 |
+
| end of split 24 /113 | epoch 2 | time: 238.49s | valid loss 7.1007 | valid ppl 1212.8266 | learning rate 1.2500
|
138 |
+
| end of split 25 /113 | epoch 2 | time: 228.89s | valid loss 7.0794 | valid ppl 1187.2814 | learning rate 1.2500
|
139 |
+
| end of split 26 /113 | epoch 2 | time: 231.21s | valid loss 7.0910 | valid ppl 1201.0850 | learning rate 1.2500
|
140 |
+
| end of split 27 /113 | epoch 2 | time: 236.23s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
|
141 |
+
| end of split 28 /113 | epoch 2 | time: 234.70s | valid loss 7.0858 | valid ppl 1194.8918 | learning rate 1.2500
|
142 |
+
| end of split 29 /113 | epoch 2 | time: 229.67s | valid loss 7.0637 | valid ppl 1168.7198 | learning rate 1.2500
|
143 |
+
| end of split 30 /113 | epoch 2 | time: 230.59s | valid loss 7.1101 | valid ppl 1224.2250 | learning rate 1.2500
|
144 |
+
| end of split 31 /113 | epoch 2 | time: 232.68s | valid loss 7.0836 | valid ppl 1192.2460 | learning rate 1.2500
|
145 |
+
| end of split 32 /113 | epoch 2 | time: 231.80s | valid loss 7.1094 | valid ppl 1223.3879 | learning rate 1.2500
|
146 |
+
| end of split 33 /113 | epoch 2 | time: 234.73s | valid loss 7.1026 | valid ppl 1215.0679 | learning rate 1.2500
|
147 |
+
| end of split 34 /113 | epoch 2 | time: 232.94s | valid loss 7.0845 | valid ppl 1193.3580 | learning rate 1.2500
|
148 |
+
| end of split 35 /113 | epoch 2 | time: 232.85s | valid loss 7.1046 | valid ppl 1217.5067 | learning rate 1.2500
|
149 |
+
| end of split 36 /113 | epoch 2 | time: 236.10s | valid loss 7.1064 | valid ppl 1219.7146 | learning rate 1.2500
|
150 |
+
| end of split 37 /113 | epoch 2 | time: 234.89s | valid loss 7.0999 | valid ppl 1211.8541 | learning rate 1.2500
|
151 |
+
| end of split 38 /113 | epoch 2 | time: 239.33s | valid loss 7.0895 | valid ppl 1199.2961 | learning rate 1.2500
|
152 |
+
| end of split 39 /113 | epoch 2 | time: 239.01s | valid loss 7.1112 | valid ppl 1225.6211 | learning rate 1.2500
|
153 |
+
| end of split 40 /113 | epoch 2 | time: 233.50s | valid loss 7.0895 | valid ppl 1199.3484 | learning rate 1.2500
|
154 |
+
| end of split 41 /113 | epoch 2 | time: 237.27s | valid loss 7.0723 | valid ppl 1178.8008 | learning rate 1.2500
|
155 |
+
| end of split 42 /113 | epoch 2 | time: 231.15s | valid loss 7.0958 | valid ppl 1206.8495 | learning rate 1.2500
|
156 |
+
| end of split 43 /113 | epoch 2 | time: 231.39s | valid loss 7.0922 | valid ppl 1202.5908 | learning rate 1.2500
|
157 |
+
| end of split 44 /113 | epoch 2 | time: 229.96s | valid loss 7.1024 | valid ppl 1214.8449 | learning rate 1.2500
|
158 |
+
| end of split 45 /113 | epoch 2 | time: 237.25s | valid loss 7.1115 | valid ppl 1226.0123 | learning rate 1.2500
|
159 |
+
| end of split 46 /113 | epoch 2 | time: 233.19s | valid loss 7.0828 | valid ppl 1191.2430 | learning rate 1.2500
|
160 |
+
| end of split 47 /113 | epoch 2 | time: 232.26s | valid loss 7.0917 | valid ppl 1201.9762 | learning rate 1.2500
|
161 |
+
| end of split 48 /113 | epoch 2 | time: 227.95s | valid loss 7.0983 | valid ppl 1209.8765 | learning rate 1.2500
|
162 |
+
| end of split 49 /113 | epoch 2 | time: 232.30s | valid loss 7.0888 | valid ppl 1198.4128 | learning rate 0.3125
|
163 |
+
| end of split 50 /113 | epoch 2 | time: 238.16s | valid loss 7.0910 | valid ppl 1201.0504 | learning rate 0.3125
|
164 |
+
| end of split 51 /113 | epoch 2 | time: 233.23s | valid loss 7.0949 | valid ppl 1205.7495 | learning rate 0.3125
|
165 |
+
| end of split 52 /113 | epoch 2 | time: 232.61s | valid loss 7.0807 | valid ppl 1188.8117 | learning rate 0.3125
|
166 |
+
| end of split 53 /113 | epoch 2 | time: 233.73s | valid loss 7.0902 | valid ppl 1200.1734 | learning rate 0.3125
|
167 |
+
| end of split 54 /113 | epoch 2 | time: 230.67s | valid loss 7.0855 | valid ppl 1194.5399 | learning rate 0.3125
|
168 |
+
| end of split 55 /113 | epoch 2 | time: 235.17s | valid loss 7.0903 | valid ppl 1200.2645 | learning rate 0.3125
|
169 |
+
| end of split 56 /113 | epoch 2 | time: 230.04s | valid loss 7.0905 | valid ppl 1200.5506 | learning rate 0.3125
|
170 |
+
| end of split 57 /113 | epoch 2 | time: 235.80s | valid loss 7.0972 | valid ppl 1208.5664 | learning rate 0.3125
|
171 |
+
| end of split 58 /113 | epoch 2 | time: 233.83s | valid loss 7.0926 | valid ppl 1203.0872 | learning rate 0.3125
|
172 |
+
| end of split 59 /113 | epoch 2 | time: 234.66s | valid loss 7.0922 | valid ppl 1202.5223 | learning rate 0.3125
|
173 |
+
| end of split 60 /113 | epoch 2 | time: 231.74s | valid loss 7.0899 | valid ppl 1199.8190 | learning rate 0.3125
|
174 |
+
| end of split 61 /113 | epoch 2 | time: 228.91s | valid loss 7.0938 | valid ppl 1204.4743 | learning rate 0.3125
|
175 |
+
| end of split 62 /113 | epoch 2 | time: 235.87s | valid loss 7.0887 | valid ppl 1198.3909 | learning rate 0.3125
|
176 |
+
| end of split 63 /113 | epoch 2 | time: 234.42s | valid loss 7.0820 | valid ppl 1190.2886 | learning rate 0.3125
|
177 |
+
| end of split 64 /113 | epoch 2 | time: 233.77s | valid loss 7.0910 | valid ppl 1201.1087 | learning rate 0.3125
|
178 |
+
| end of split 65 /113 | epoch 2 | time: 235.55s | valid loss 7.0922 | valid ppl 1202.4961 | learning rate 0.3125
|
179 |
+
| end of split 66 /113 | epoch 2 | time: 231.77s | valid loss 7.0890 | valid ppl 1198.6597 | learning rate 0.3125
|
180 |
+
| end of split 67 /113 | epoch 2 | time: 239.03s | valid loss 7.0907 | valid ppl 1200.6899 | learning rate 0.3125
|
181 |
+
| end of split 68 /113 | epoch 2 | time: 233.79s | valid loss 7.0929 | valid ppl 1203.3503 | learning rate 0.3125
|
182 |
+
| end of split 69 /113 | epoch 2 | time: 230.34s | valid loss 7.0980 | valid ppl 1209.6052 | learning rate 0.3125
|
183 |
+
| end of split 70 /113 | epoch 2 | time: 236.49s | valid loss 7.0882 | valid ppl 1197.7819 | learning rate 0.3125
|
184 |
+
| end of split 71 /113 | epoch 2 | time: 234.44s | valid loss 7.1003 | valid ppl 1212.3714 | learning rate 0.3125
|
185 |
+
| end of split 72 /113 | epoch 2 | time: 233.01s | valid loss 7.0828 | valid ppl 1191.3159 | learning rate 0.3125
|
186 |
+
| end of split 73 /113 | epoch 2 | time: 238.78s | valid loss 7.0959 | valid ppl 1207.0328 | learning rate 0.3125
|
187 |
+
| end of split 74 /113 | epoch 2 | time: 239.67s | valid loss 7.0914 | valid ppl 1201.5850 | learning rate 0.3125
|
188 |
+
| end of split 75 /113 | epoch 2 | time: 230.83s | valid loss 7.1005 | valid ppl 1212.5495 | learning rate 0.3125
|
189 |
+
| end of split 76 /113 | epoch 2 | time: 235.05s | valid loss 7.0889 | valid ppl 1198.6319 | learning rate 0.3125
|
190 |
+
| end of split 77 /113 | epoch 2 | time: 230.27s | valid loss 7.0923 | valid ppl 1202.6914 | learning rate 0.3125
|
191 |
+
| end of split 78 /113 | epoch 2 | time: 231.51s | valid loss 7.0787 | valid ppl 1186.4144 | learning rate 0.3125
|
192 |
+
| end of split 79 /113 | epoch 2 | time: 232.70s | valid loss 7.0995 | valid ppl 1211.3830 | learning rate 0.3125
|
193 |
+
| end of split 80 /113 | epoch 2 | time: 233.21s | valid loss 7.0929 | valid ppl 1203.3740 | learning rate 0.3125
|
194 |
+
| end of split 81 /113 | epoch 2 | time: 230.05s | valid loss 7.0802 | valid ppl 1188.1591 | learning rate 0.3125
|
195 |
+
| end of split 82 /113 | epoch 2 | time: 235.62s | valid loss 7.0860 | valid ppl 1195.0842 | learning rate 0.3125
|
196 |
+
| end of split 83 /113 | epoch 2 | time: 236.11s | valid loss 7.0906 | valid ppl 1200.6764 | learning rate 0.3125
|
197 |
+
| end of split 84 /113 | epoch 2 | time: 230.87s | valid loss 7.0850 | valid ppl 1193.9009 | learning rate 0.3125
|
198 |
+
| end of split 85 /113 | epoch 2 | time: 232.62s | valid loss 7.0939 | valid ppl 1204.6437 | learning rate 0.3125
|
199 |
+
| end of split 86 /113 | epoch 2 | time: 238.23s | valid loss 7.0856 | valid ppl 1194.6482 | learning rate 0.3125
|
200 |
+
| end of split 87 /113 | epoch 2 | time: 233.77s | valid loss 7.0942 | valid ppl 1205.0113 | learning rate 0.3125
|
201 |
+
| end of split 88 /113 | epoch 2 | time: 230.52s | valid loss 7.0954 | valid ppl 1206.3736 | learning rate 0.3125
|
202 |
+
| end of split 89 /113 | epoch 2 | time: 235.21s | valid loss 7.0953 | valid ppl 1206.2616 | learning rate 0.3125
|
203 |
+
| end of split 90 /113 | epoch 2 | time: 236.74s | valid loss 7.0902 | valid ppl 1200.1371 | learning rate 0.3125
|
204 |
+
| end of split 91 /113 | epoch 2 | time: 234.19s | valid loss 7.0940 | valid ppl 1204.7284 | learning rate 0.3125
|
205 |
+
| end of split 92 /113 | epoch 2 | time: 229.17s | valid loss 7.0667 | valid ppl 1172.2181 | learning rate 0.3125
|
206 |
+
| end of split 93 /113 | epoch 2 | time: 233.18s | valid loss 7.0851 | valid ppl 1193.9966 | learning rate 0.3125
|
207 |
+
| end of split 94 /113 | epoch 2 | time: 233.54s | valid loss 7.0983 | valid ppl 1209.8629 | learning rate 0.3125
|
208 |
+
| end of split 95 /113 | epoch 2 | time: 240.46s | valid loss 7.0915 | valid ppl 1201.7565 | learning rate 0.3125
|
209 |
+
| end of split 96 /113 | epoch 2 | time: 232.63s | valid loss 7.0925 | valid ppl 1202.8766 | learning rate 0.3125
|
210 |
+
| end of split 97 /113 | epoch 2 | time: 236.79s | valid loss 7.0868 | valid ppl 1196.0248 | learning rate 0.3125
|
211 |
+
| end of split 98 /113 | epoch 2 | time: 234.71s | valid loss 7.0826 | valid ppl 1191.0655 | learning rate 0.3125
|
212 |
+
| end of split 99 /113 | epoch 2 | time: 233.29s | valid loss 7.0957 | valid ppl 1206.8113 | learning rate 0.3125
|
213 |
+
| end of split 100 /113 | epoch 2 | time: 236.83s | valid loss 7.0924 | valid ppl 1202.8005 | learning rate 0.0781
|
214 |
+
| end of split 101 /113 | epoch 2 | time: 48.85s | valid loss 7.0897 | valid ppl 1199.5980 | learning rate 0.0781
|
215 |
+
| end of split 102 /113 | epoch 2 | time: 236.70s | valid loss 7.0890 | valid ppl 1198.7280 | learning rate 0.0781
|
216 |
+
| end of split 103 /113 | epoch 2 | time: 238.79s | valid loss 7.0864 | valid ppl 1195.5683 | learning rate 0.0781
|
217 |
+
| end of split 104 /113 | epoch 2 | time: 232.38s | valid loss 7.0929 | valid ppl 1203.4357 | learning rate 0.0781
|
218 |
+
| end of split 105 /113 | epoch 2 | time: 229.19s | valid loss 7.0942 | valid ppl 1204.8987 | learning rate 0.0781
|
219 |
+
| end of split 106 /113 | epoch 2 | time: 231.16s | valid loss 7.0949 | valid ppl 1205.8207 | learning rate 0.0781
|
220 |
+
| end of split 107 /113 | epoch 2 | time: 232.93s | valid loss 7.0896 | valid ppl 1199.3762 | learning rate 0.0781
|
221 |
+
| end of split 108 /113 | epoch 2 | time: 234.06s | valid loss 7.0961 | valid ppl 1207.2101 | learning rate 0.0781
|
222 |
+
| end of split 109 /113 | epoch 2 | time: 233.27s | valid loss 7.0883 | valid ppl 1197.8653 | learning rate 0.0781
|
223 |
+
| end of split 110 /113 | epoch 2 | time: 234.69s | valid loss 7.0930 | valid ppl 1203.4772 | learning rate 0.0781
|
224 |
+
| end of split 111 /113 | epoch 2 | time: 231.50s | valid loss 7.0946 | valid ppl 1205.4435 | learning rate 0.0781
|
training.log
ADDED
The diff for this file is too large to render.
See raw diff
|
|