stefan-it commited on
Commit
962439a
1 Parent(s): b8dd5fd

model: add training log

Browse files
Files changed (2) hide show
  1. loss.txt +224 -0
  2. training.log +0 -0
loss.txt ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ | end of split 1 /113 | epoch 1 | time: 224.45s | valid loss 7.6183 | valid ppl 2035.0861 | learning rate 5.0000
2
+ | end of split 2 /113 | epoch 1 | time: 229.45s | valid loss 7.3864 | valid ppl 1613.9065 | learning rate 5.0000
3
+ | end of split 3 /113 | epoch 1 | time: 239.40s | valid loss 7.3424 | valid ppl 1544.3504 | learning rate 5.0000
4
+ | end of split 4 /113 | epoch 1 | time: 233.67s | valid loss 7.2568 | valid ppl 1417.6838 | learning rate 5.0000
5
+ | end of split 5 /113 | epoch 1 | time: 227.57s | valid loss 7.2848 | valid ppl 1458.0133 | learning rate 5.0000
6
+ | end of split 6 /113 | epoch 1 | time: 235.49s | valid loss 7.2458 | valid ppl 1402.2080 | learning rate 5.0000
7
+ | end of split 7 /113 | epoch 1 | time: 235.14s | valid loss 7.2137 | valid ppl 1357.8841 | learning rate 5.0000
8
+ | end of split 8 /113 | epoch 1 | time: 238.90s | valid loss 7.1989 | valid ppl 1337.9002 | learning rate 5.0000
9
+ | end of split 9 /113 | epoch 1 | time: 228.81s | valid loss 7.1782 | valid ppl 1310.5202 | learning rate 5.0000
10
+ | end of split 10 /113 | epoch 1 | time: 230.95s | valid loss 7.1692 | valid ppl 1298.8697 | learning rate 5.0000
11
+ | end of split 11 /113 | epoch 1 | time: 231.70s | valid loss 7.1442 | valid ppl 1266.7305 | learning rate 5.0000
12
+ | end of split 12 /113 | epoch 1 | time: 240.42s | valid loss 7.1839 | valid ppl 1317.9954 | learning rate 5.0000
13
+ | end of split 13 /113 | epoch 1 | time: 235.25s | valid loss 7.2127 | valid ppl 1356.5282 | learning rate 5.0000
14
+ | end of split 14 /113 | epoch 1 | time: 232.67s | valid loss 7.2704 | valid ppl 1437.1488 | learning rate 5.0000
15
+ | end of split 15 /113 | epoch 1 | time: 229.99s | valid loss 7.1410 | valid ppl 1262.7434 | learning rate 5.0000
16
+ | end of split 16 /113 | epoch 1 | time: 230.24s | valid loss 7.2028 | valid ppl 1343.1933 | learning rate 5.0000
17
+ | end of split 17 /113 | epoch 1 | time: 48.80s | valid loss 7.1864 | valid ppl 1321.2975 | learning rate 5.0000
18
+ | end of split 18 /113 | epoch 1 | time: 238.71s | valid loss 7.1344 | valid ppl 1254.4124 | learning rate 5.0000
19
+ | end of split 19 /113 | epoch 1 | time: 238.74s | valid loss 7.1402 | valid ppl 1261.6803 | learning rate 5.0000
20
+ | end of split 20 /113 | epoch 1 | time: 230.88s | valid loss 7.2222 | valid ppl 1369.5573 | learning rate 5.0000
21
+ | end of split 21 /113 | epoch 1 | time: 235.01s | valid loss 7.1024 | valid ppl 1214.8458 | learning rate 5.0000
22
+ | end of split 22 /113 | epoch 1 | time: 233.22s | valid loss 7.1523 | valid ppl 1277.0068 | learning rate 5.0000
23
+ | end of split 23 /113 | epoch 1 | time: 234.10s | valid loss 7.1516 | valid ppl 1276.1012 | learning rate 5.0000
24
+ | end of split 24 /113 | epoch 1 | time: 234.94s | valid loss 7.1347 | valid ppl 1254.7220 | learning rate 5.0000
25
+ | end of split 25 /113 | epoch 1 | time: 232.93s | valid loss 7.1199 | valid ppl 1236.2833 | learning rate 5.0000
26
+ | end of split 26 /113 | epoch 1 | time: 234.40s | valid loss 7.1184 | valid ppl 1234.5018 | learning rate 5.0000
27
+ | end of split 27 /113 | epoch 1 | time: 237.28s | valid loss 7.1083 | valid ppl 1222.0958 | learning rate 5.0000
28
+ | end of split 28 /113 | epoch 1 | time: 231.57s | valid loss 7.1589 | valid ppl 1285.4715 | learning rate 5.0000
29
+ | end of split 29 /113 | epoch 1 | time: 232.64s | valid loss 7.1232 | valid ppl 1240.4354 | learning rate 5.0000
30
+ | end of split 30 /113 | epoch 1 | time: 238.52s | valid loss 7.0960 | valid ppl 1207.1889 | learning rate 5.0000
31
+ | end of split 31 /113 | epoch 1 | time: 235.86s | valid loss 7.1294 | valid ppl 1248.0873 | learning rate 5.0000
32
+ | end of split 32 /113 | epoch 1 | time: 234.67s | valid loss 7.1366 | valid ppl 1257.1105 | learning rate 5.0000
33
+ | end of split 33 /113 | epoch 1 | time: 236.46s | valid loss 7.0806 | valid ppl 1188.6487 | learning rate 5.0000
34
+ | end of split 34 /113 | epoch 1 | time: 231.14s | valid loss 7.1160 | valid ppl 1231.4851 | learning rate 5.0000
35
+ | end of split 35 /113 | epoch 1 | time: 236.11s | valid loss 7.1426 | valid ppl 1264.6883 | learning rate 5.0000
36
+ | end of split 36 /113 | epoch 1 | time: 232.98s | valid loss 7.1442 | valid ppl 1266.7118 | learning rate 5.0000
37
+ | end of split 37 /113 | epoch 1 | time: 235.77s | valid loss 7.1382 | valid ppl 1259.1016 | learning rate 5.0000
38
+ | end of split 38 /113 | epoch 1 | time: 235.38s | valid loss 7.0742 | valid ppl 1181.0755 | learning rate 5.0000
39
+ | end of split 39 /113 | epoch 1 | time: 230.26s | valid loss 7.1081 | valid ppl 1221.7934 | learning rate 5.0000
40
+ | end of split 40 /113 | epoch 1 | time: 233.25s | valid loss 7.0893 | valid ppl 1199.0533 | learning rate 5.0000
41
+ | end of split 41 /113 | epoch 1 | time: 232.96s | valid loss 7.0886 | valid ppl 1198.2460 | learning rate 5.0000
42
+ | end of split 42 /113 | epoch 1 | time: 233.86s | valid loss 7.1457 | valid ppl 1268.6031 | learning rate 5.0000
43
+ | end of split 43 /113 | epoch 1 | time: 234.62s | valid loss 7.1386 | valid ppl 1259.6532 | learning rate 5.0000
44
+ | end of split 44 /113 | epoch 1 | time: 232.69s | valid loss 7.0900 | valid ppl 1199.9118 | learning rate 5.0000
45
+ | end of split 45 /113 | epoch 1 | time: 230.84s | valid loss 7.1523 | valid ppl 1276.9780 | learning rate 5.0000
46
+ | end of split 46 /113 | epoch 1 | time: 231.71s | valid loss 7.1219 | valid ppl 1238.7760 | learning rate 5.0000
47
+ | end of split 47 /113 | epoch 1 | time: 230.86s | valid loss 7.0811 | valid ppl 1189.2806 | learning rate 5.0000
48
+ | end of split 48 /113 | epoch 1 | time: 232.63s | valid loss 7.1543 | valid ppl 1279.6527 | learning rate 5.0000
49
+ | end of split 49 /113 | epoch 1 | time: 233.86s | valid loss 7.0683 | valid ppl 1174.0986 | learning rate 5.0000
50
+ | end of split 50 /113 | epoch 1 | time: 229.15s | valid loss 7.0550 | valid ppl 1158.6403 | learning rate 5.0000
51
+ | end of split 51 /113 | epoch 1 | time: 236.63s | valid loss 7.1117 | valid ppl 1226.2546 | learning rate 5.0000
52
+ | end of split 52 /113 | epoch 1 | time: 238.10s | valid loss 7.1026 | valid ppl 1215.1584 | learning rate 5.0000
53
+ | end of split 53 /113 | epoch 1 | time: 232.74s | valid loss 7.0969 | valid ppl 1208.2648 | learning rate 5.0000
54
+ | end of split 54 /113 | epoch 1 | time: 238.09s | valid loss 7.0846 | valid ppl 1193.4612 | learning rate 5.0000
55
+ | end of split 55 /113 | epoch 1 | time: 233.70s | valid loss 7.1157 | valid ppl 1231.1284 | learning rate 5.0000
56
+ | end of split 56 /113 | epoch 1 | time: 230.09s | valid loss 7.0540 | valid ppl 1157.4801 | learning rate 5.0000
57
+ | end of split 57 /113 | epoch 1 | time: 235.27s | valid loss 7.0783 | valid ppl 1185.9658 | learning rate 5.0000
58
+ | end of split 58 /113 | epoch 1 | time: 233.74s | valid loss 7.1189 | valid ppl 1235.0774 | learning rate 5.0000
59
+ | end of split 59 /113 | epoch 1 | time: 229.77s | valid loss 7.0364 | valid ppl 1137.2668 | learning rate 5.0000
60
+ | end of split 60 /113 | epoch 1 | time: 233.24s | valid loss 7.0514 | valid ppl 1154.5030 | learning rate 5.0000
61
+ | end of split 61 /113 | epoch 1 | time: 236.63s | valid loss 7.1055 | valid ppl 1218.6020 | learning rate 5.0000
62
+ | end of split 62 /113 | epoch 1 | time: 233.17s | valid loss 7.1210 | valid ppl 1237.6443 | learning rate 5.0000
63
+ | end of split 63 /113 | epoch 1 | time: 234.66s | valid loss 7.0762 | valid ppl 1183.4137 | learning rate 5.0000
64
+ | end of split 64 /113 | epoch 1 | time: 232.58s | valid loss 7.1240 | valid ppl 1241.4370 | learning rate 5.0000
65
+ | end of split 65 /113 | epoch 1 | time: 231.51s | valid loss 7.0930 | valid ppl 1203.5000 | learning rate 5.0000
66
+ | end of split 66 /113 | epoch 1 | time: 232.26s | valid loss 7.1001 | valid ppl 1212.0637 | learning rate 5.0000
67
+ | end of split 67 /113 | epoch 1 | time: 228.92s | valid loss 7.0738 | valid ppl 1180.6015 | learning rate 5.0000
68
+ | end of split 68 /113 | epoch 1 | time: 230.60s | valid loss 7.1206 | valid ppl 1237.2528 | learning rate 5.0000
69
+ | end of split 69 /113 | epoch 1 | time: 232.29s | valid loss 7.1268 | valid ppl 1244.8903 | learning rate 5.0000
70
+ | end of split 70 /113 | epoch 1 | time: 234.60s | valid loss 7.1138 | valid ppl 1228.8092 | learning rate 5.0000
71
+ | end of split 71 /113 | epoch 1 | time: 231.33s | valid loss 7.0736 | valid ppl 1180.4231 | learning rate 5.0000
72
+ | end of split 72 /113 | epoch 1 | time: 235.50s | valid loss 7.0407 | valid ppl 1142.1916 | learning rate 5.0000
73
+ | end of split 73 /113 | epoch 1 | time: 230.23s | valid loss 7.0512 | valid ppl 1154.2604 | learning rate 5.0000
74
+ | end of split 74 /113 | epoch 1 | time: 239.00s | valid loss 7.1215 | valid ppl 1238.2501 | learning rate 5.0000
75
+ | end of split 75 /113 | epoch 1 | time: 234.03s | valid loss 7.1852 | valid ppl 1319.7906 | learning rate 5.0000
76
+ | end of split 76 /113 | epoch 1 | time: 234.28s | valid loss 7.0916 | valid ppl 1201.8453 | learning rate 5.0000
77
+ | end of split 77 /113 | epoch 1 | time: 235.71s | valid loss 7.0874 | valid ppl 1196.7356 | learning rate 5.0000
78
+ | end of split 78 /113 | epoch 1 | time: 237.06s | valid loss 7.1335 | valid ppl 1253.2911 | learning rate 5.0000
79
+ | end of split 79 /113 | epoch 1 | time: 233.74s | valid loss 7.1122 | valid ppl 1226.8927 | learning rate 5.0000
80
+ | end of split 80 /113 | epoch 1 | time: 233.17s | valid loss 7.1309 | valid ppl 1250.0614 | learning rate 5.0000
81
+ | end of split 81 /113 | epoch 1 | time: 232.30s | valid loss 7.0873 | valid ppl 1196.7297 | learning rate 5.0000
82
+ | end of split 82 /113 | epoch 1 | time: 231.22s | valid loss 7.1370 | valid ppl 1257.6055 | learning rate 5.0000
83
+ | end of split 83 /113 | epoch 1 | time: 231.43s | valid loss 7.0576 | valid ppl 1161.6918 | learning rate 5.0000
84
+ | end of split 84 /113 | epoch 1 | time: 235.02s | valid loss 7.0657 | valid ppl 1171.0550 | learning rate 5.0000
85
+ | end of split 85 /113 | epoch 1 | time: 234.79s | valid loss 7.1117 | valid ppl 1226.2184 | learning rate 5.0000
86
+ | end of split 86 /113 | epoch 1 | time: 239.30s | valid loss 7.0911 | valid ppl 1201.2320 | learning rate 5.0000
87
+ | end of split 87 /113 | epoch 1 | time: 230.62s | valid loss 7.0994 | valid ppl 1211.2212 | learning rate 5.0000
88
+ | end of split 88 /113 | epoch 1 | time: 231.93s | valid loss 7.1275 | valid ppl 1245.7974 | learning rate 5.0000
89
+ | end of split 89 /113 | epoch 1 | time: 231.13s | valid loss 7.0923 | valid ppl 1202.6127 | learning rate 5.0000
90
+ | end of split 90 /113 | epoch 1 | time: 236.74s | valid loss 7.1520 | valid ppl 1276.6935 | learning rate 5.0000
91
+ | end of split 91 /113 | epoch 1 | time: 232.98s | valid loss 7.1159 | valid ppl 1231.3526 | learning rate 5.0000
92
+ | end of split 92 /113 | epoch 1 | time: 236.25s | valid loss 7.1405 | valid ppl 1262.0972 | learning rate 5.0000
93
+ | end of split 93 /113 | epoch 1 | time: 234.62s | valid loss 7.0885 | valid ppl 1198.1424 | learning rate 5.0000
94
+ | end of split 94 /113 | epoch 1 | time: 233.59s | valid loss 7.1003 | valid ppl 1212.3560 | learning rate 5.0000
95
+ | end of split 95 /113 | epoch 1 | time: 233.27s | valid loss 7.1059 | valid ppl 1219.0888 | learning rate 5.0000
96
+ | end of split 96 /113 | epoch 1 | time: 231.78s | valid loss 7.1232 | valid ppl 1240.4668 | learning rate 5.0000
97
+ | end of split 97 /113 | epoch 1 | time: 235.60s | valid loss 7.1186 | valid ppl 1234.7345 | learning rate 5.0000
98
+ | end of split 98 /113 | epoch 1 | time: 233.88s | valid loss 7.1161 | valid ppl 1231.6487 | learning rate 5.0000
99
+ | end of split 99 /113 | epoch 1 | time: 236.68s | valid loss 7.1076 | valid ppl 1221.1639 | learning rate 5.0000
100
+ | end of split 100 /113 | epoch 1 | time: 232.62s | valid loss 7.0984 | valid ppl 1210.0832 | learning rate 5.0000
101
+ | end of split 101 /113 | epoch 1 | time: 233.49s | valid loss 7.1288 | valid ppl 1247.4030 | learning rate 5.0000
102
+ | end of split 102 /113 | epoch 1 | time: 232.34s | valid loss 7.0934 | valid ppl 1204.0527 | learning rate 5.0000
103
+ | end of split 103 /113 | epoch 1 | time: 230.64s | valid loss 7.1062 | valid ppl 1219.4642 | learning rate 5.0000
104
+ | end of split 104 /113 | epoch 1 | time: 235.83s | valid loss 7.1531 | valid ppl 1278.0091 | learning rate 5.0000
105
+ | end of split 105 /113 | epoch 1 | time: 230.35s | valid loss 7.1200 | valid ppl 1236.4884 | learning rate 5.0000
106
+ | end of split 106 /113 | epoch 1 | time: 231.68s | valid loss 7.1236 | valid ppl 1240.9623 | learning rate 5.0000
107
+ | end of split 107 /113 | epoch 1 | time: 236.04s | valid loss 7.0998 | valid ppl 1211.7024 | learning rate 5.0000
108
+ | end of split 108 /113 | epoch 1 | time: 231.16s | valid loss 7.1267 | valid ppl 1244.7170 | learning rate 5.0000
109
+ | end of split 109 /113 | epoch 1 | time: 235.80s | valid loss 7.1114 | valid ppl 1225.8615 | learning rate 5.0000
110
+ | end of split 110 /113 | epoch 1 | time: 229.11s | valid loss 7.0848 | valid ppl 1193.6844 | learning rate 5.0000
111
+ | end of split 111 /113 | epoch 1 | time: 232.32s | valid loss 7.0782 | valid ppl 1185.7957 | learning rate 1.2500
112
+ | end of split 112 /113 | epoch 1 | time: 232.60s | valid loss 7.0965 | valid ppl 1207.7586 | learning rate 1.2500
113
+ | end of split 113 /113 | epoch 1 | time: 237.25s | valid loss 7.1007 | valid ppl 1212.7755 | learning rate 1.2500
114
+ | end of split 1 /113 | epoch 2 | time: 229.76s | valid loss 7.0779 | valid ppl 1185.4298 | learning rate 1.2500
115
+ | end of split 2 /113 | epoch 2 | time: 232.20s | valid loss 7.0994 | valid ppl 1211.1846 | learning rate 1.2500
116
+ | end of split 3 /113 | epoch 2 | time: 230.39s | valid loss 7.0802 | valid ppl 1188.2092 | learning rate 1.2500
117
+ | end of split 4 /113 | epoch 2 | time: 232.46s | valid loss 7.0951 | valid ppl 1205.9962 | learning rate 1.2500
118
+ | end of split 5 /113 | epoch 2 | time: 232.66s | valid loss 7.1047 | valid ppl 1217.6557 | learning rate 1.2500
119
+ | end of split 6 /113 | epoch 2 | time: 231.54s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
120
+ | end of split 7 /113 | epoch 2 | time: 234.75s | valid loss 7.1142 | valid ppl 1229.3492 | learning rate 1.2500
121
+ | end of split 8 /113 | epoch 2 | time: 235.30s | valid loss 7.0901 | valid ppl 1200.0375 | learning rate 1.2500
122
+ | end of split 9 /113 | epoch 2 | time: 235.81s | valid loss 7.0971 | valid ppl 1208.4907 | learning rate 1.2500
123
+ | end of split 10 /113 | epoch 2 | time: 230.40s | valid loss 7.0927 | valid ppl 1203.1642 | learning rate 1.2500
124
+ | end of split 11 /113 | epoch 2 | time: 235.86s | valid loss 7.1028 | valid ppl 1215.3789 | learning rate 1.2500
125
+ | end of split 12 /113 | epoch 2 | time: 230.91s | valid loss 7.0949 | valid ppl 1205.7953 | learning rate 1.2500
126
+ | end of split 13 /113 | epoch 2 | time: 233.88s | valid loss 7.0789 | valid ppl 1186.6439 | learning rate 1.2500
127
+ | end of split 14 /113 | epoch 2 | time: 232.71s | valid loss 7.0946 | valid ppl 1205.4994 | learning rate 1.2500
128
+ | end of split 15 /113 | epoch 2 | time: 230.99s | valid loss 7.0850 | valid ppl 1193.9639 | learning rate 1.2500
129
+ | end of split 16 /113 | epoch 2 | time: 227.77s | valid loss 7.1121 | valid ppl 1226.6969 | learning rate 1.2500
130
+ | end of split 17 /113 | epoch 2 | time: 235.85s | valid loss 7.0980 | valid ppl 1209.5941 | learning rate 1.2500
131
+ | end of split 18 /113 | epoch 2 | time: 235.06s | valid loss 7.0815 | valid ppl 1189.7783 | learning rate 1.2500
132
+ | end of split 19 /113 | epoch 2 | time: 237.29s | valid loss 7.1028 | valid ppl 1215.3490 | learning rate 1.2500
133
+ | end of split 20 /113 | epoch 2 | time: 235.29s | valid loss 7.0942 | valid ppl 1204.9817 | learning rate 1.2500
134
+ | end of split 21 /113 | epoch 2 | time: 231.22s | valid loss 7.0837 | valid ppl 1192.3273 | learning rate 1.2500
135
+ | end of split 22 /113 | epoch 2 | time: 235.58s | valid loss 7.0989 | valid ppl 1210.6321 | learning rate 1.2500
136
+ | end of split 23 /113 | epoch 2 | time: 232.62s | valid loss 7.0947 | valid ppl 1205.5749 | learning rate 1.2500
137
+ | end of split 24 /113 | epoch 2 | time: 238.49s | valid loss 7.1007 | valid ppl 1212.8266 | learning rate 1.2500
138
+ | end of split 25 /113 | epoch 2 | time: 228.89s | valid loss 7.0794 | valid ppl 1187.2814 | learning rate 1.2500
139
+ | end of split 26 /113 | epoch 2 | time: 231.21s | valid loss 7.0910 | valid ppl 1201.0850 | learning rate 1.2500
140
+ | end of split 27 /113 | epoch 2 | time: 236.23s | valid loss 7.0950 | valid ppl 1205.9267 | learning rate 1.2500
141
+ | end of split 28 /113 | epoch 2 | time: 234.70s | valid loss 7.0858 | valid ppl 1194.8918 | learning rate 1.2500
142
+ | end of split 29 /113 | epoch 2 | time: 229.67s | valid loss 7.0637 | valid ppl 1168.7198 | learning rate 1.2500
143
+ | end of split 30 /113 | epoch 2 | time: 230.59s | valid loss 7.1101 | valid ppl 1224.2250 | learning rate 1.2500
144
+ | end of split 31 /113 | epoch 2 | time: 232.68s | valid loss 7.0836 | valid ppl 1192.2460 | learning rate 1.2500
145
+ | end of split 32 /113 | epoch 2 | time: 231.80s | valid loss 7.1094 | valid ppl 1223.3879 | learning rate 1.2500
146
+ | end of split 33 /113 | epoch 2 | time: 234.73s | valid loss 7.1026 | valid ppl 1215.0679 | learning rate 1.2500
147
+ | end of split 34 /113 | epoch 2 | time: 232.94s | valid loss 7.0845 | valid ppl 1193.3580 | learning rate 1.2500
148
+ | end of split 35 /113 | epoch 2 | time: 232.85s | valid loss 7.1046 | valid ppl 1217.5067 | learning rate 1.2500
149
+ | end of split 36 /113 | epoch 2 | time: 236.10s | valid loss 7.1064 | valid ppl 1219.7146 | learning rate 1.2500
150
+ | end of split 37 /113 | epoch 2 | time: 234.89s | valid loss 7.0999 | valid ppl 1211.8541 | learning rate 1.2500
151
+ | end of split 38 /113 | epoch 2 | time: 239.33s | valid loss 7.0895 | valid ppl 1199.2961 | learning rate 1.2500
152
+ | end of split 39 /113 | epoch 2 | time: 239.01s | valid loss 7.1112 | valid ppl 1225.6211 | learning rate 1.2500
153
+ | end of split 40 /113 | epoch 2 | time: 233.50s | valid loss 7.0895 | valid ppl 1199.3484 | learning rate 1.2500
154
+ | end of split 41 /113 | epoch 2 | time: 237.27s | valid loss 7.0723 | valid ppl 1178.8008 | learning rate 1.2500
155
+ | end of split 42 /113 | epoch 2 | time: 231.15s | valid loss 7.0958 | valid ppl 1206.8495 | learning rate 1.2500
156
+ | end of split 43 /113 | epoch 2 | time: 231.39s | valid loss 7.0922 | valid ppl 1202.5908 | learning rate 1.2500
157
+ | end of split 44 /113 | epoch 2 | time: 229.96s | valid loss 7.1024 | valid ppl 1214.8449 | learning rate 1.2500
158
+ | end of split 45 /113 | epoch 2 | time: 237.25s | valid loss 7.1115 | valid ppl 1226.0123 | learning rate 1.2500
159
+ | end of split 46 /113 | epoch 2 | time: 233.19s | valid loss 7.0828 | valid ppl 1191.2430 | learning rate 1.2500
160
+ | end of split 47 /113 | epoch 2 | time: 232.26s | valid loss 7.0917 | valid ppl 1201.9762 | learning rate 1.2500
161
+ | end of split 48 /113 | epoch 2 | time: 227.95s | valid loss 7.0983 | valid ppl 1209.8765 | learning rate 1.2500
162
+ | end of split 49 /113 | epoch 2 | time: 232.30s | valid loss 7.0888 | valid ppl 1198.4128 | learning rate 0.3125
163
+ | end of split 50 /113 | epoch 2 | time: 238.16s | valid loss 7.0910 | valid ppl 1201.0504 | learning rate 0.3125
164
+ | end of split 51 /113 | epoch 2 | time: 233.23s | valid loss 7.0949 | valid ppl 1205.7495 | learning rate 0.3125
165
+ | end of split 52 /113 | epoch 2 | time: 232.61s | valid loss 7.0807 | valid ppl 1188.8117 | learning rate 0.3125
166
+ | end of split 53 /113 | epoch 2 | time: 233.73s | valid loss 7.0902 | valid ppl 1200.1734 | learning rate 0.3125
167
+ | end of split 54 /113 | epoch 2 | time: 230.67s | valid loss 7.0855 | valid ppl 1194.5399 | learning rate 0.3125
168
+ | end of split 55 /113 | epoch 2 | time: 235.17s | valid loss 7.0903 | valid ppl 1200.2645 | learning rate 0.3125
169
+ | end of split 56 /113 | epoch 2 | time: 230.04s | valid loss 7.0905 | valid ppl 1200.5506 | learning rate 0.3125
170
+ | end of split 57 /113 | epoch 2 | time: 235.80s | valid loss 7.0972 | valid ppl 1208.5664 | learning rate 0.3125
171
+ | end of split 58 /113 | epoch 2 | time: 233.83s | valid loss 7.0926 | valid ppl 1203.0872 | learning rate 0.3125
172
+ | end of split 59 /113 | epoch 2 | time: 234.66s | valid loss 7.0922 | valid ppl 1202.5223 | learning rate 0.3125
173
+ | end of split 60 /113 | epoch 2 | time: 231.74s | valid loss 7.0899 | valid ppl 1199.8190 | learning rate 0.3125
174
+ | end of split 61 /113 | epoch 2 | time: 228.91s | valid loss 7.0938 | valid ppl 1204.4743 | learning rate 0.3125
175
+ | end of split 62 /113 | epoch 2 | time: 235.87s | valid loss 7.0887 | valid ppl 1198.3909 | learning rate 0.3125
176
+ | end of split 63 /113 | epoch 2 | time: 234.42s | valid loss 7.0820 | valid ppl 1190.2886 | learning rate 0.3125
177
+ | end of split 64 /113 | epoch 2 | time: 233.77s | valid loss 7.0910 | valid ppl 1201.1087 | learning rate 0.3125
178
+ | end of split 65 /113 | epoch 2 | time: 235.55s | valid loss 7.0922 | valid ppl 1202.4961 | learning rate 0.3125
179
+ | end of split 66 /113 | epoch 2 | time: 231.77s | valid loss 7.0890 | valid ppl 1198.6597 | learning rate 0.3125
180
+ | end of split 67 /113 | epoch 2 | time: 239.03s | valid loss 7.0907 | valid ppl 1200.6899 | learning rate 0.3125
181
+ | end of split 68 /113 | epoch 2 | time: 233.79s | valid loss 7.0929 | valid ppl 1203.3503 | learning rate 0.3125
182
+ | end of split 69 /113 | epoch 2 | time: 230.34s | valid loss 7.0980 | valid ppl 1209.6052 | learning rate 0.3125
183
+ | end of split 70 /113 | epoch 2 | time: 236.49s | valid loss 7.0882 | valid ppl 1197.7819 | learning rate 0.3125
184
+ | end of split 71 /113 | epoch 2 | time: 234.44s | valid loss 7.1003 | valid ppl 1212.3714 | learning rate 0.3125
185
+ | end of split 72 /113 | epoch 2 | time: 233.01s | valid loss 7.0828 | valid ppl 1191.3159 | learning rate 0.3125
186
+ | end of split 73 /113 | epoch 2 | time: 238.78s | valid loss 7.0959 | valid ppl 1207.0328 | learning rate 0.3125
187
+ | end of split 74 /113 | epoch 2 | time: 239.67s | valid loss 7.0914 | valid ppl 1201.5850 | learning rate 0.3125
188
+ | end of split 75 /113 | epoch 2 | time: 230.83s | valid loss 7.1005 | valid ppl 1212.5495 | learning rate 0.3125
189
+ | end of split 76 /113 | epoch 2 | time: 235.05s | valid loss 7.0889 | valid ppl 1198.6319 | learning rate 0.3125
190
+ | end of split 77 /113 | epoch 2 | time: 230.27s | valid loss 7.0923 | valid ppl 1202.6914 | learning rate 0.3125
191
+ | end of split 78 /113 | epoch 2 | time: 231.51s | valid loss 7.0787 | valid ppl 1186.4144 | learning rate 0.3125
192
+ | end of split 79 /113 | epoch 2 | time: 232.70s | valid loss 7.0995 | valid ppl 1211.3830 | learning rate 0.3125
193
+ | end of split 80 /113 | epoch 2 | time: 233.21s | valid loss 7.0929 | valid ppl 1203.3740 | learning rate 0.3125
194
+ | end of split 81 /113 | epoch 2 | time: 230.05s | valid loss 7.0802 | valid ppl 1188.1591 | learning rate 0.3125
195
+ | end of split 82 /113 | epoch 2 | time: 235.62s | valid loss 7.0860 | valid ppl 1195.0842 | learning rate 0.3125
196
+ | end of split 83 /113 | epoch 2 | time: 236.11s | valid loss 7.0906 | valid ppl 1200.6764 | learning rate 0.3125
197
+ | end of split 84 /113 | epoch 2 | time: 230.87s | valid loss 7.0850 | valid ppl 1193.9009 | learning rate 0.3125
198
+ | end of split 85 /113 | epoch 2 | time: 232.62s | valid loss 7.0939 | valid ppl 1204.6437 | learning rate 0.3125
199
+ | end of split 86 /113 | epoch 2 | time: 238.23s | valid loss 7.0856 | valid ppl 1194.6482 | learning rate 0.3125
200
+ | end of split 87 /113 | epoch 2 | time: 233.77s | valid loss 7.0942 | valid ppl 1205.0113 | learning rate 0.3125
201
+ | end of split 88 /113 | epoch 2 | time: 230.52s | valid loss 7.0954 | valid ppl 1206.3736 | learning rate 0.3125
202
+ | end of split 89 /113 | epoch 2 | time: 235.21s | valid loss 7.0953 | valid ppl 1206.2616 | learning rate 0.3125
203
+ | end of split 90 /113 | epoch 2 | time: 236.74s | valid loss 7.0902 | valid ppl 1200.1371 | learning rate 0.3125
204
+ | end of split 91 /113 | epoch 2 | time: 234.19s | valid loss 7.0940 | valid ppl 1204.7284 | learning rate 0.3125
205
+ | end of split 92 /113 | epoch 2 | time: 229.17s | valid loss 7.0667 | valid ppl 1172.2181 | learning rate 0.3125
206
+ | end of split 93 /113 | epoch 2 | time: 233.18s | valid loss 7.0851 | valid ppl 1193.9966 | learning rate 0.3125
207
+ | end of split 94 /113 | epoch 2 | time: 233.54s | valid loss 7.0983 | valid ppl 1209.8629 | learning rate 0.3125
208
+ | end of split 95 /113 | epoch 2 | time: 240.46s | valid loss 7.0915 | valid ppl 1201.7565 | learning rate 0.3125
209
+ | end of split 96 /113 | epoch 2 | time: 232.63s | valid loss 7.0925 | valid ppl 1202.8766 | learning rate 0.3125
210
+ | end of split 97 /113 | epoch 2 | time: 236.79s | valid loss 7.0868 | valid ppl 1196.0248 | learning rate 0.3125
211
+ | end of split 98 /113 | epoch 2 | time: 234.71s | valid loss 7.0826 | valid ppl 1191.0655 | learning rate 0.3125
212
+ | end of split 99 /113 | epoch 2 | time: 233.29s | valid loss 7.0957 | valid ppl 1206.8113 | learning rate 0.3125
213
+ | end of split 100 /113 | epoch 2 | time: 236.83s | valid loss 7.0924 | valid ppl 1202.8005 | learning rate 0.0781
214
+ | end of split 101 /113 | epoch 2 | time: 48.85s | valid loss 7.0897 | valid ppl 1199.5980 | learning rate 0.0781
215
+ | end of split 102 /113 | epoch 2 | time: 236.70s | valid loss 7.0890 | valid ppl 1198.7280 | learning rate 0.0781
216
+ | end of split 103 /113 | epoch 2 | time: 238.79s | valid loss 7.0864 | valid ppl 1195.5683 | learning rate 0.0781
217
+ | end of split 104 /113 | epoch 2 | time: 232.38s | valid loss 7.0929 | valid ppl 1203.4357 | learning rate 0.0781
218
+ | end of split 105 /113 | epoch 2 | time: 229.19s | valid loss 7.0942 | valid ppl 1204.8987 | learning rate 0.0781
219
+ | end of split 106 /113 | epoch 2 | time: 231.16s | valid loss 7.0949 | valid ppl 1205.8207 | learning rate 0.0781
220
+ | end of split 107 /113 | epoch 2 | time: 232.93s | valid loss 7.0896 | valid ppl 1199.3762 | learning rate 0.0781
221
+ | end of split 108 /113 | epoch 2 | time: 234.06s | valid loss 7.0961 | valid ppl 1207.2101 | learning rate 0.0781
222
+ | end of split 109 /113 | epoch 2 | time: 233.27s | valid loss 7.0883 | valid ppl 1197.8653 | learning rate 0.0781
223
+ | end of split 110 /113 | epoch 2 | time: 234.69s | valid loss 7.0930 | valid ppl 1203.4772 | learning rate 0.0781
224
+ | end of split 111 /113 | epoch 2 | time: 231.50s | valid loss 7.0946 | valid ppl 1205.4435 | learning rate 0.0781
training.log ADDED
The diff for this file is too large to render. See raw diff