File size: 14,850 Bytes
5096607
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
[1] loss: 3.922, train acc: 9.710 test acc: 16.210  19.120 s
[2] loss: 3.306, train acc: 19.734 test acc: 24.700  19.272 s
[3] loss: 2.890, train acc: 27.350 test acc: 30.530  19.156 s
[4] loss: 2.572, train acc: 33.976 test acc: 34.630  18.599 s
[5] loss: 2.351, train acc: 38.444 test acc: 39.340  19.358 s
[6] loss: 2.182, train acc: 42.156 test acc: 41.620  19.322 s
[7] loss: 2.060, train acc: 44.980 test acc: 43.830  18.912 s
[8] loss: 1.946, train acc: 47.628 test acc: 45.620  19.201 s
[9] loss: 1.861, train acc: 49.356 test acc: 47.350  18.739 s
[10] loss: 1.778, train acc: 51.572 test acc: 47.440  19.149 s
[11] loss: 1.720, train acc: 52.758 test acc: 48.690  18.959 s
[12] loss: 1.665, train acc: 54.102 test acc: 50.070  18.651 s
[13] loss: 1.611, train acc: 55.504 test acc: 51.010  18.681 s
[14] loss: 1.565, train acc: 56.742 test acc: 51.310  18.636 s
[15] loss: 1.523, train acc: 57.590 test acc: 50.750  19.178 s
[16] loss: 1.493, train acc: 58.122 test acc: 52.760  18.726 s
[17] loss: 1.456, train acc: 59.148 test acc: 53.310  19.150 s
[18] loss: 1.425, train acc: 60.064 test acc: 53.020  18.625 s
[19] loss: 1.395, train acc: 60.686 test acc: 53.310  18.945 s
[20] loss: 1.366, train acc: 61.512 test acc: 54.200  20.388 s
[21] loss: 1.337, train acc: 62.098 test acc: 54.400  18.636 s
[22] loss: 1.317, train acc: 62.850 test acc: 54.450  18.698 s
[23] loss: 1.288, train acc: 63.556 test acc: 54.980  24.444 s
[24] loss: 1.270, train acc: 63.970 test acc: 54.640  19.223 s
[25] loss: 1.242, train acc: 64.418 test acc: 55.670  19.068 s
[26] loss: 1.228, train acc: 65.022 test acc: 55.390  18.723 s
[27] loss: 1.212, train acc: 65.308 test acc: 56.070  18.621 s
[28] loss: 1.192, train acc: 65.950 test acc: 55.740  18.721 s
[29] loss: 1.172, train acc: 66.610 test acc: 56.360  18.999 s
[30] loss: 1.162, train acc: 66.744 test acc: 56.040  19.265 s
[31] loss: 1.139, train acc: 67.142 test acc: 56.610  18.620 s
[32] loss: 1.127, train acc: 67.530 test acc: 56.350  18.952 s
[33] loss: 1.113, train acc: 67.938 test acc: 56.930  19.421 s
[34] loss: 1.103, train acc: 68.186 test acc: 56.610  19.007 s
[35] loss: 1.081, train acc: 68.868 test acc: 56.850  19.002 s
[36] loss: 1.077, train acc: 68.798 test acc: 57.090  18.931 s
[37] loss: 1.063, train acc: 69.366 test acc: 57.010  18.142 s
[38] loss: 1.048, train acc: 69.726 test acc: 57.600  18.577 s
[39] loss: 1.034, train acc: 70.048 test acc: 57.630  19.337 s
[40] loss: 1.021, train acc: 70.398 test acc: 58.170  18.606 s
[41] loss: 1.013, train acc: 70.720 test acc: 57.340  19.218 s
[42] loss: 1.001, train acc: 71.000 test acc: 58.030  18.656 s
[43] loss: 0.991, train acc: 71.130 test acc: 58.170  18.731 s
[44] loss: 0.982, train acc: 71.388 test acc: 58.150  18.939 s
[45] loss: 0.972, train acc: 71.786 test acc: 57.920  20.176 s
[46] loss: 0.959, train acc: 72.054 test acc: 58.770  19.481 s
[47] loss: 0.946, train acc: 72.474 test acc: 57.930  19.065 s
[48] loss: 0.935, train acc: 72.638 test acc: 57.890  19.334 s
[49] loss: 0.928, train acc: 72.724 test acc: 58.370  18.734 s
[50] loss: 0.925, train acc: 72.930 test acc: 58.690  18.609 s
[51] loss: 0.911, train acc: 73.478 test acc: 58.120  19.188 s
[52] loss: 0.906, train acc: 73.406 test acc: 57.950  18.921 s
[53] loss: 0.896, train acc: 73.732 test acc: 58.300  18.764 s
[54] loss: 0.891, train acc: 73.804 test acc: 58.070  18.855 s
[55] loss: 0.881, train acc: 74.204 test acc: 57.960  18.914 s
[56] loss: 0.873, train acc: 74.446 test acc: 58.690  18.841 s
[57] loss: 0.865, train acc: 74.332 test acc: 58.390  19.063 s
[58] loss: 0.856, train acc: 74.850 test acc: 58.630  19.052 s
[59] loss: 0.849, train acc: 75.136 test acc: 59.100  18.923 s
[60] loss: 0.851, train acc: 74.982 test acc: 58.100  18.426 s
[61] loss: 0.839, train acc: 75.072 test acc: 57.940  19.223 s
[62] loss: 0.828, train acc: 75.610 test acc: 58.210  19.462 s
[63] loss: 0.821, train acc: 75.916 test acc: 57.980  18.999 s
[64] loss: 0.816, train acc: 75.868 test acc: 59.340  18.477 s
[65] loss: 0.806, train acc: 76.154 test acc: 58.640  19.336 s
[66] loss: 0.802, train acc: 76.380 test acc: 59.180  19.209 s
[67] loss: 0.794, train acc: 76.694 test acc: 59.110  18.478 s
[68] loss: 0.792, train acc: 76.544 test acc: 59.230  18.842 s
[69] loss: 0.781, train acc: 77.010 test acc: 58.640  18.791 s
[70] loss: 0.777, train acc: 77.002 test acc: 59.170  19.276 s
[71] loss: 0.773, train acc: 77.146 test acc: 59.250  19.578 s
[72] loss: 0.767, train acc: 77.232 test acc: 59.000  19.281 s
[73] loss: 0.760, train acc: 77.390 test acc: 59.020  18.526 s
[74] loss: 0.762, train acc: 77.430 test acc: 58.650  18.691 s
[75] loss: 0.755, train acc: 77.836 test acc: 59.310  20.628 s
[76] loss: 0.750, train acc: 77.732 test acc: 59.170  18.904 s
[77] loss: 0.745, train acc: 77.560 test acc: 58.820  19.015 s
[78] loss: 0.738, train acc: 78.148 test acc: 58.990  19.101 s
[79] loss: 0.729, train acc: 78.210 test acc: 58.660  18.940 s
[80] loss: 0.728, train acc: 78.240 test acc: 58.870  18.424 s
[81] loss: 0.723, train acc: 78.442 test acc: 58.510  19.399 s
[82] loss: 0.718, train acc: 78.706 test acc: 58.610  18.937 s
[83] loss: 0.712, train acc: 78.724 test acc: 58.560  19.048 s
[84] loss: 0.705, train acc: 78.776 test acc: 58.810  18.905 s
[85] loss: 0.704, train acc: 78.982 test acc: 58.250  19.172 s
[86] loss: 0.698, train acc: 79.308 test acc: 58.380  19.347 s
[87] loss: 0.693, train acc: 79.318 test acc: 58.450  19.214 s
[88] loss: 0.686, train acc: 79.432 test acc: 59.050  19.092 s
[89] loss: 0.683, train acc: 79.574 test acc: 59.140  18.626 s
[90] loss: 0.679, train acc: 79.708 test acc: 58.440  19.234 s
[91] loss: 0.672, train acc: 79.968 test acc: 58.560  18.429 s
[92] loss: 0.669, train acc: 80.088 test acc: 58.820  18.924 s
[93] loss: 0.660, train acc: 80.174 test acc: 58.480  18.966 s
[94] loss: 0.664, train acc: 80.024 test acc: 58.970  18.989 s
[95] loss: 0.656, train acc: 80.338 test acc: 59.070  18.756 s
[96] loss: 0.654, train acc: 80.278 test acc: 59.270  19.369 s
[97] loss: 0.648, train acc: 80.548 test acc: 59.050  19.416 s
[98] loss: 0.641, train acc: 80.714 test acc: 59.120  18.987 s
[99] loss: 0.646, train acc: 80.624 test acc: 58.520  18.932 s
[100] loss: 0.638, train acc: 80.954 test acc: 59.050  19.094 s
[1] loss: 0.580, train acc: 82.956 test acc: 60.010  18.612 s
[2] loss: 0.557, train acc: 83.868 test acc: 59.950  18.785 s
[3] loss: 0.552, train acc: 83.906 test acc: 60.080  19.294 s
[4] loss: 0.546, train acc: 84.102 test acc: 60.190  19.067 s
[5] loss: 0.539, train acc: 84.412 test acc: 59.960  18.777 s
[6] loss: 0.539, train acc: 84.556 test acc: 60.070  18.761 s
[7] loss: 0.536, train acc: 84.534 test acc: 60.050  18.752 s
[8] loss: 0.530, train acc: 84.778 test acc: 59.820  18.836 s
[9] loss: 0.533, train acc: 84.568 test acc: 60.220  19.284 s
[10] loss: 0.528, train acc: 84.792 test acc: 59.970  18.962 s
[11] loss: 0.528, train acc: 84.710 test acc: 60.090  18.949 s
[12] loss: 0.527, train acc: 84.716 test acc: 60.050  18.657 s
[13] loss: 0.525, train acc: 84.716 test acc: 60.180  18.807 s
[14] loss: 0.521, train acc: 84.866 test acc: 59.980  18.586 s
[15] loss: 0.522, train acc: 84.864 test acc: 60.010  19.012 s
[16] loss: 0.517, train acc: 85.004 test acc: 59.850  19.005 s
[17] loss: 0.520, train acc: 84.860 test acc: 60.080  19.120 s
[18] loss: 0.511, train acc: 85.258 test acc: 60.210  18.975 s
[19] loss: 0.513, train acc: 85.128 test acc: 60.210  19.032 s
[20] loss: 0.507, train acc: 85.348 test acc: 59.940  18.446 s
[1] loss: 0.501, train acc: 85.592 test acc: 60.100  18.988 s
[2] loss: 0.490, train acc: 86.018 test acc: 60.070  18.917 s
[3] loss: 0.488, train acc: 85.992 test acc: 59.990  18.860 s
[4] loss: 0.493, train acc: 86.016 test acc: 59.870  18.987 s
[5] loss: 0.485, train acc: 86.248 test acc: 60.040  18.584 s
[6] loss: 0.487, train acc: 86.264 test acc: 60.130  18.601 s
[7] loss: 0.486, train acc: 86.110 test acc: 60.160  18.754 s
[8] loss: 0.486, train acc: 86.056 test acc: 60.070  18.997 s
[9] loss: 0.485, train acc: 86.114 test acc: 60.190  18.654 s
[10] loss: 0.484, train acc: 86.144 test acc: 60.130  18.356 s
[11] loss: 0.482, train acc: 86.410 test acc: 59.970  18.743 s
[12] loss: 0.484, train acc: 86.180 test acc: 60.030  19.216 s
[13] loss: 0.482, train acc: 86.230 test acc: 60.250  20.355 s
[14] loss: 0.483, train acc: 86.010 test acc: 60.300  19.104 s
[15] loss: 0.482, train acc: 86.146 test acc: 59.910  18.860 s
[16] loss: 0.484, train acc: 86.202 test acc: 60.070  18.826 s
[17] loss: 0.480, train acc: 86.304 test acc: 60.060  18.555 s
[18] loss: 0.482, train acc: 86.260 test acc: 60.280  19.010 s
[19] loss: 0.481, train acc: 86.156 test acc: 60.300  18.804 s
[20] loss: 0.479, train acc: 86.360 test acc: 60.310  18.998 s
[1] loss: 0.479, train acc: 86.142 test acc: 60.280  18.646 s
[2] loss: 0.476, train acc: 86.300 test acc: 60.320  18.658 s
[3] loss: 0.475, train acc: 86.410 test acc: 60.240  19.096 s
[4] loss: 0.475, train acc: 86.532 test acc: 60.260  18.890 s
[5] loss: 0.476, train acc: 86.228 test acc: 60.250  19.536 s
[6] loss: 0.473, train acc: 86.540 test acc: 60.290  18.323 s
[7] loss: 0.476, train acc: 86.352 test acc: 60.230  19.586 s
[8] loss: 0.473, train acc: 86.520 test acc: 60.230  19.256 s
[9] loss: 0.472, train acc: 86.624 test acc: 60.310  18.598 s
[10] loss: 0.475, train acc: 86.556 test acc: 60.350  18.936 s
[11] loss: 0.475, train acc: 86.476 test acc: 60.380  18.681 s
[12] loss: 0.471, train acc: 86.486 test acc: 60.340  20.621 s
[13] loss: 0.474, train acc: 86.558 test acc: 60.310  18.922 s
[14] loss: 0.470, train acc: 86.620 test acc: 60.290  19.109 s
[15] loss: 0.473, train acc: 86.634 test acc: 60.170  19.187 s
[16] loss: 0.474, train acc: 86.436 test acc: 60.270  18.899 s
[17] loss: 0.471, train acc: 86.656 test acc: 60.280  19.279 s
[18] loss: 0.474, train acc: 86.480 test acc: 60.150  19.134 s
[19] loss: 0.471, train acc: 86.580 test acc: 60.200  18.532 s
[20] loss: 0.473, train acc: 86.662 test acc: 60.170  18.995 s
[1] loss: 1.106, train acc: 76.134 test acc: 62.780  38.125 s
[2] loss: 0.874, train acc: 80.666 test acc: 63.290  39.722 s
[3] loss: 0.838, train acc: 80.908 test acc: 63.320  38.934 s
[4] loss: 0.819, train acc: 81.398 test acc: 63.560  38.463 s
[5] loss: 0.810, train acc: 81.292 test acc: 63.210  38.697 s
[6] loss: 0.803, train acc: 81.268 test acc: 63.530  38.476 s
[7] loss: 0.793, train acc: 81.176 test acc: 63.700  38.083 s
[8] loss: 0.790, train acc: 81.434 test acc: 63.320  38.817 s
[9] loss: 0.787, train acc: 81.242 test acc: 63.570  38.433 s
[10] loss: 0.782, train acc: 81.380 test acc: 63.710  38.234 s
[11] loss: 0.778, train acc: 81.572 test acc: 63.640  39.205 s
[12] loss: 0.773, train acc: 81.422 test acc: 63.700  38.101 s
[13] loss: 0.767, train acc: 81.550 test acc: 63.580  38.276 s
[14] loss: 0.762, train acc: 81.648 test acc: 63.680  38.218 s
[15] loss: 0.766, train acc: 81.220 test acc: 63.710  38.191 s
[16] loss: 0.759, train acc: 81.704 test acc: 63.640  37.920 s
[17] loss: 0.756, train acc: 81.480 test acc: 63.790  38.715 s
[18] loss: 0.758, train acc: 81.528 test acc: 63.760  38.157 s
[19] loss: 0.756, train acc: 81.654 test acc: 63.840  38.704 s
[20] loss: 0.756, train acc: 81.532 test acc: 63.800  38.097 s
[21] loss: 0.752, train acc: 81.542 test acc: 63.900  38.504 s
[22] loss: 0.746, train acc: 81.598 test acc: 63.830  38.281 s
[23] loss: 0.747, train acc: 81.616 test acc: 63.760  38.159 s

restarting with half the learning rate, zero optimizer state

[1] loss: 0.742, train acc: 81.706 test acc: 63.920  36.892 s
[2] loss: 0.743, train acc: 81.778 test acc: 63.970  36.748 s
[3] loss: 0.739, train acc: 81.960 test acc: 63.890  36.376 s
[4] loss: 0.737, train acc: 81.954 test acc: 63.770  35.944 s
[5] loss: 0.735, train acc: 81.996 test acc: 64.210  36.866 s
[6] loss: 0.734, train acc: 82.072 test acc: 63.930  36.578 s
[7] loss: 0.734, train acc: 81.916 test acc: 63.930  37.215 s
[8] loss: 0.729, train acc: 81.992 test acc: 63.880  36.817 s
[9] loss: 0.732, train acc: 82.108 test acc: 64.080  36.487 s
[10] loss: 0.728, train acc: 82.142 test acc: 64.070  36.806 s
[11] loss: 0.733, train acc: 81.934 test acc: 63.990  36.853 s
[1] loss: 0.781, train acc: 81.422 test acc: 63.790  37.518 s
[2] loss: 0.821, train acc: 80.904 test acc: 63.350  37.203 s
[3] loss: 0.841, train acc: 80.668 test acc: 63.400  37.730 s
[4] loss: 0.856, train acc: 80.196 test acc: 63.190  37.715 s
[5] loss: 0.866, train acc: 80.016 test acc: 63.070  37.500 s
[6] loss: 0.874, train acc: 79.680 test acc: 63.050  38.076 s
[7] loss: 0.881, train acc: 79.606 test acc: 63.030  37.768 s
[8] loss: 0.882, train acc: 79.624 test acc: 62.860  38.120 s
[9] loss: 0.884, train acc: 79.590 test acc: 62.980  37.331 s
[1] loss: 0.737, train acc: 81.764 test acc: 63.780  39.241 s
[2] loss: 0.706, train acc: 81.852 test acc: 64.160  38.618 s
[3] loss: 0.691, train acc: 82.076 test acc: 64.070  39.309 s
[4] loss: 0.686, train acc: 82.174 test acc: 64.260  38.344 s
[5] loss: 0.674, train acc: 82.528 test acc: 64.100  38.361 s
[6] loss: 0.673, train acc: 82.422 test acc: 64.480  38.350 s
[7] loss: 0.667, train acc: 82.700 test acc: 64.370  38.942 s
[8] loss: 0.665, train acc: 82.792 test acc: 64.400  38.189 s
[9] loss: 0.662, train acc: 82.726 test acc: 64.440  38.667 s
[10] loss: 0.660, train acc: 82.766 test acc: 64.370  39.073 s
[11] loss: 0.660, train acc: 82.808 test acc: 64.400  38.822 s
[12] loss: 0.653, train acc: 83.032 test acc: 64.430  38.702 s
[1] loss: 0.678, train acc: 82.660 test acc: 64.240  37.287 s
[2] loss: 0.688, train acc: 82.854 test acc: 64.310  37.077 s
[3] loss: 0.694, train acc: 82.710 test acc: 64.230  36.969 s
[4] loss: 0.701, train acc: 82.636 test acc: 64.210  36.958 s
[5] loss: 0.702, train acc: 82.640 test acc: 64.300  36.997 s
[6] loss: 0.704, train acc: 82.408 test acc: 64.180  37.049 s
[7] loss: 0.703, train acc: 82.806 test acc: 64.160  37.687 s
[8] loss: 0.710, train acc: 82.334 test acc: 63.980  37.277 s
[9] loss: 0.709, train acc: 82.544 test acc: 64.290  37.380 s
[10] loss: 0.706, train acc: 82.538 test acc: 64.070  37.523 s
[11] loss: 0.712, train acc: 82.400 test acc: 64.020  37.281 s
[12] loss: 0.708, train acc: 82.548 test acc: 63.950  36.890 s
[13] loss: 0.710, train acc: 82.606 test acc: 64.150  36.889 s
[14] loss: 0.709, train acc: 82.514 test acc: 64.210  38.943 s
[15] loss: 0.710, train acc: 82.704 test acc: 64.310  37.126 s
[16] loss: 0.710, train acc: 82.650 test acc: 64.090  36.937 s
[17] loss: 0.712, train acc: 82.526 test acc: 64.180  37.442 s
[18] loss: 0.710, train acc: 82.840 test acc: 64.070  37.089 s
[19] loss: 0.711, train acc: 82.582 test acc: 64.220  37.877 s
[20] loss: 0.710, train acc: 82.668 test acc: 64.150  37.814 s
[21] loss: 0.709, train acc: 82.544 test acc: 64.150  37.165 s