ALM-AHME commited on
Commit
840a518
1 Parent(s): 80d5c95

End of training

Browse files
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.0,
3
+ "eval_accuracy": 0.9943422913719944,
4
+ "eval_loss": 0.022870095446705818,
5
+ "eval_runtime": 102.4976,
6
+ "eval_samples_per_second": 20.693,
7
+ "eval_steps_per_second": 2.595,
8
+ "total_flos": 1.0256650917539217e+19,
9
+ "train_loss": 0.15325962240578392,
10
+ "train_runtime": 8636.3959,
11
+ "train_samples_per_second": 5.157,
12
+ "train_steps_per_second": 0.161
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.0,
3
+ "eval_accuracy": 0.9943422913719944,
4
+ "eval_loss": 0.022870095446705818,
5
+ "eval_runtime": 102.4976,
6
+ "eval_samples_per_second": 20.693,
7
+ "eval_steps_per_second": 2.595
8
+ }
runs/Jul16_09-38-00_4bb7fc5f194e/events.out.tfevents.1689509130.4bb7fc5f194e.338.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af0454a953e1d36b0829e1aa197d7fdb92d1c2163c7a92ae31c3fa61dc5df53b
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.0,
3
+ "total_flos": 1.0256650917539217e+19,
4
+ "train_loss": 0.15325962240578392,
5
+ "train_runtime": 8636.3959,
6
+ "train_samples_per_second": 5.157,
7
+ "train_steps_per_second": 0.161
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1756 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9943422913719944,
3
+ "best_model_checkpoint": "swinv2-large-patch4-window12to16-192to256-22kto1k-ft-finetuned-BreastCancer-BreakHis-AH-60-20-20/checkpoint-1194",
4
+ "epoch": 7.0,
5
+ "global_step": 1393,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.03,
12
+ "learning_rate": 3.586800573888092e-07,
13
+ "loss": 0.7746,
14
+ "step": 5
15
+ },
16
+ {
17
+ "epoch": 0.05,
18
+ "learning_rate": 7.173601147776184e-07,
19
+ "loss": 0.739,
20
+ "step": 10
21
+ },
22
+ {
23
+ "epoch": 0.08,
24
+ "learning_rate": 1.0760401721664278e-06,
25
+ "loss": 0.7228,
26
+ "step": 15
27
+ },
28
+ {
29
+ "epoch": 0.1,
30
+ "learning_rate": 1.4347202295552369e-06,
31
+ "loss": 0.7082,
32
+ "step": 20
33
+ },
34
+ {
35
+ "epoch": 0.13,
36
+ "learning_rate": 1.793400286944046e-06,
37
+ "loss": 0.6802,
38
+ "step": 25
39
+ },
40
+ {
41
+ "epoch": 0.15,
42
+ "learning_rate": 2.1520803443328555e-06,
43
+ "loss": 0.6662,
44
+ "step": 30
45
+ },
46
+ {
47
+ "epoch": 0.18,
48
+ "learning_rate": 2.5107604017216644e-06,
49
+ "loss": 0.631,
50
+ "step": 35
51
+ },
52
+ {
53
+ "epoch": 0.2,
54
+ "learning_rate": 2.8694404591104738e-06,
55
+ "loss": 0.6168,
56
+ "step": 40
57
+ },
58
+ {
59
+ "epoch": 0.23,
60
+ "learning_rate": 3.2281205164992827e-06,
61
+ "loss": 0.5649,
62
+ "step": 45
63
+ },
64
+ {
65
+ "epoch": 0.25,
66
+ "learning_rate": 3.586800573888092e-06,
67
+ "loss": 0.5661,
68
+ "step": 50
69
+ },
70
+ {
71
+ "epoch": 0.28,
72
+ "learning_rate": 3.945480631276901e-06,
73
+ "loss": 0.4748,
74
+ "step": 55
75
+ },
76
+ {
77
+ "epoch": 0.3,
78
+ "learning_rate": 4.304160688665711e-06,
79
+ "loss": 0.4842,
80
+ "step": 60
81
+ },
82
+ {
83
+ "epoch": 0.33,
84
+ "learning_rate": 4.66284074605452e-06,
85
+ "loss": 0.4239,
86
+ "step": 65
87
+ },
88
+ {
89
+ "epoch": 0.35,
90
+ "learning_rate": 5.021520803443329e-06,
91
+ "loss": 0.4256,
92
+ "step": 70
93
+ },
94
+ {
95
+ "epoch": 0.38,
96
+ "learning_rate": 5.380200860832138e-06,
97
+ "loss": 0.3675,
98
+ "step": 75
99
+ },
100
+ {
101
+ "epoch": 0.4,
102
+ "learning_rate": 5.7388809182209475e-06,
103
+ "loss": 0.3646,
104
+ "step": 80
105
+ },
106
+ {
107
+ "epoch": 0.43,
108
+ "learning_rate": 6.0975609756097564e-06,
109
+ "loss": 0.3939,
110
+ "step": 85
111
+ },
112
+ {
113
+ "epoch": 0.45,
114
+ "learning_rate": 6.456241032998565e-06,
115
+ "loss": 0.3783,
116
+ "step": 90
117
+ },
118
+ {
119
+ "epoch": 0.48,
120
+ "learning_rate": 6.814921090387374e-06,
121
+ "loss": 0.4425,
122
+ "step": 95
123
+ },
124
+ {
125
+ "epoch": 0.5,
126
+ "learning_rate": 7.173601147776184e-06,
127
+ "loss": 0.348,
128
+ "step": 100
129
+ },
130
+ {
131
+ "epoch": 0.53,
132
+ "learning_rate": 7.532281205164993e-06,
133
+ "loss": 0.3337,
134
+ "step": 105
135
+ },
136
+ {
137
+ "epoch": 0.55,
138
+ "learning_rate": 7.890961262553803e-06,
139
+ "loss": 0.2808,
140
+ "step": 110
141
+ },
142
+ {
143
+ "epoch": 0.58,
144
+ "learning_rate": 8.249641319942612e-06,
145
+ "loss": 0.3154,
146
+ "step": 115
147
+ },
148
+ {
149
+ "epoch": 0.6,
150
+ "learning_rate": 8.608321377331422e-06,
151
+ "loss": 0.2637,
152
+ "step": 120
153
+ },
154
+ {
155
+ "epoch": 0.63,
156
+ "learning_rate": 8.967001434720231e-06,
157
+ "loss": 0.2857,
158
+ "step": 125
159
+ },
160
+ {
161
+ "epoch": 0.65,
162
+ "learning_rate": 9.32568149210904e-06,
163
+ "loss": 0.2633,
164
+ "step": 130
165
+ },
166
+ {
167
+ "epoch": 0.68,
168
+ "learning_rate": 9.684361549497849e-06,
169
+ "loss": 0.3059,
170
+ "step": 135
171
+ },
172
+ {
173
+ "epoch": 0.7,
174
+ "learning_rate": 1.0043041606886658e-05,
175
+ "loss": 0.2524,
176
+ "step": 140
177
+ },
178
+ {
179
+ "epoch": 0.73,
180
+ "learning_rate": 1.0401721664275467e-05,
181
+ "loss": 0.2154,
182
+ "step": 145
183
+ },
184
+ {
185
+ "epoch": 0.75,
186
+ "learning_rate": 1.0760401721664276e-05,
187
+ "loss": 0.3399,
188
+ "step": 150
189
+ },
190
+ {
191
+ "epoch": 0.78,
192
+ "learning_rate": 1.1119081779053084e-05,
193
+ "loss": 0.2305,
194
+ "step": 155
195
+ },
196
+ {
197
+ "epoch": 0.8,
198
+ "learning_rate": 1.1477761836441895e-05,
199
+ "loss": 0.2263,
200
+ "step": 160
201
+ },
202
+ {
203
+ "epoch": 0.83,
204
+ "learning_rate": 1.1836441893830704e-05,
205
+ "loss": 0.1931,
206
+ "step": 165
207
+ },
208
+ {
209
+ "epoch": 0.85,
210
+ "learning_rate": 1.2195121951219513e-05,
211
+ "loss": 0.2317,
212
+ "step": 170
213
+ },
214
+ {
215
+ "epoch": 0.88,
216
+ "learning_rate": 1.2553802008608323e-05,
217
+ "loss": 0.2118,
218
+ "step": 175
219
+ },
220
+ {
221
+ "epoch": 0.9,
222
+ "learning_rate": 1.291248206599713e-05,
223
+ "loss": 0.1408,
224
+ "step": 180
225
+ },
226
+ {
227
+ "epoch": 0.93,
228
+ "learning_rate": 1.3271162123385941e-05,
229
+ "loss": 0.24,
230
+ "step": 185
231
+ },
232
+ {
233
+ "epoch": 0.95,
234
+ "learning_rate": 1.3629842180774748e-05,
235
+ "loss": 0.2579,
236
+ "step": 190
237
+ },
238
+ {
239
+ "epoch": 0.98,
240
+ "learning_rate": 1.3988522238163559e-05,
241
+ "loss": 0.2053,
242
+ "step": 195
243
+ },
244
+ {
245
+ "epoch": 1.0,
246
+ "eval_accuracy": 0.9495520980669495,
247
+ "eval_loss": 0.1227400153875351,
248
+ "eval_runtime": 524.4475,
249
+ "eval_samples_per_second": 4.044,
250
+ "eval_steps_per_second": 0.507,
251
+ "step": 199
252
+ },
253
+ {
254
+ "epoch": 1.01,
255
+ "learning_rate": 1.4347202295552368e-05,
256
+ "loss": 0.2,
257
+ "step": 200
258
+ },
259
+ {
260
+ "epoch": 1.03,
261
+ "learning_rate": 1.4705882352941177e-05,
262
+ "loss": 0.2002,
263
+ "step": 205
264
+ },
265
+ {
266
+ "epoch": 1.06,
267
+ "learning_rate": 1.5064562410329986e-05,
268
+ "loss": 0.2141,
269
+ "step": 210
270
+ },
271
+ {
272
+ "epoch": 1.08,
273
+ "learning_rate": 1.5423242467718796e-05,
274
+ "loss": 0.2358,
275
+ "step": 215
276
+ },
277
+ {
278
+ "epoch": 1.11,
279
+ "learning_rate": 1.5781922525107605e-05,
280
+ "loss": 0.1788,
281
+ "step": 220
282
+ },
283
+ {
284
+ "epoch": 1.13,
285
+ "learning_rate": 1.6140602582496414e-05,
286
+ "loss": 0.0829,
287
+ "step": 225
288
+ },
289
+ {
290
+ "epoch": 1.16,
291
+ "learning_rate": 1.6499282639885223e-05,
292
+ "loss": 0.205,
293
+ "step": 230
294
+ },
295
+ {
296
+ "epoch": 1.18,
297
+ "learning_rate": 1.6857962697274032e-05,
298
+ "loss": 0.1433,
299
+ "step": 235
300
+ },
301
+ {
302
+ "epoch": 1.21,
303
+ "learning_rate": 1.7216642754662844e-05,
304
+ "loss": 0.162,
305
+ "step": 240
306
+ },
307
+ {
308
+ "epoch": 1.23,
309
+ "learning_rate": 1.757532281205165e-05,
310
+ "loss": 0.2635,
311
+ "step": 245
312
+ },
313
+ {
314
+ "epoch": 1.26,
315
+ "learning_rate": 1.7934002869440462e-05,
316
+ "loss": 0.2703,
317
+ "step": 250
318
+ },
319
+ {
320
+ "epoch": 1.28,
321
+ "learning_rate": 1.8292682926829268e-05,
322
+ "loss": 0.239,
323
+ "step": 255
324
+ },
325
+ {
326
+ "epoch": 1.31,
327
+ "learning_rate": 1.865136298421808e-05,
328
+ "loss": 0.2087,
329
+ "step": 260
330
+ },
331
+ {
332
+ "epoch": 1.33,
333
+ "learning_rate": 1.9010043041606885e-05,
334
+ "loss": 0.206,
335
+ "step": 265
336
+ },
337
+ {
338
+ "epoch": 1.36,
339
+ "learning_rate": 1.9368723098995698e-05,
340
+ "loss": 0.1916,
341
+ "step": 270
342
+ },
343
+ {
344
+ "epoch": 1.38,
345
+ "learning_rate": 1.9727403156384507e-05,
346
+ "loss": 0.2102,
347
+ "step": 275
348
+ },
349
+ {
350
+ "epoch": 1.41,
351
+ "learning_rate": 2.0086083213773316e-05,
352
+ "loss": 0.1304,
353
+ "step": 280
354
+ },
355
+ {
356
+ "epoch": 1.43,
357
+ "learning_rate": 2.0444763271162124e-05,
358
+ "loss": 0.1707,
359
+ "step": 285
360
+ },
361
+ {
362
+ "epoch": 1.46,
363
+ "learning_rate": 2.0803443328550933e-05,
364
+ "loss": 0.0719,
365
+ "step": 290
366
+ },
367
+ {
368
+ "epoch": 1.48,
369
+ "learning_rate": 2.1162123385939742e-05,
370
+ "loss": 0.22,
371
+ "step": 295
372
+ },
373
+ {
374
+ "epoch": 1.51,
375
+ "learning_rate": 2.152080344332855e-05,
376
+ "loss": 0.2759,
377
+ "step": 300
378
+ },
379
+ {
380
+ "epoch": 1.53,
381
+ "learning_rate": 2.187948350071736e-05,
382
+ "loss": 0.2507,
383
+ "step": 305
384
+ },
385
+ {
386
+ "epoch": 1.56,
387
+ "learning_rate": 2.223816355810617e-05,
388
+ "loss": 0.2151,
389
+ "step": 310
390
+ },
391
+ {
392
+ "epoch": 1.58,
393
+ "learning_rate": 2.259684361549498e-05,
394
+ "loss": 0.1902,
395
+ "step": 315
396
+ },
397
+ {
398
+ "epoch": 1.61,
399
+ "learning_rate": 2.295552367288379e-05,
400
+ "loss": 0.3037,
401
+ "step": 320
402
+ },
403
+ {
404
+ "epoch": 1.63,
405
+ "learning_rate": 2.33142037302726e-05,
406
+ "loss": 0.2278,
407
+ "step": 325
408
+ },
409
+ {
410
+ "epoch": 1.66,
411
+ "learning_rate": 2.3672883787661408e-05,
412
+ "loss": 0.2791,
413
+ "step": 330
414
+ },
415
+ {
416
+ "epoch": 1.68,
417
+ "learning_rate": 2.4031563845050217e-05,
418
+ "loss": 0.1363,
419
+ "step": 335
420
+ },
421
+ {
422
+ "epoch": 1.71,
423
+ "learning_rate": 2.4390243902439026e-05,
424
+ "loss": 0.2639,
425
+ "step": 340
426
+ },
427
+ {
428
+ "epoch": 1.73,
429
+ "learning_rate": 2.4748923959827835e-05,
430
+ "loss": 0.108,
431
+ "step": 345
432
+ },
433
+ {
434
+ "epoch": 1.76,
435
+ "learning_rate": 2.5107604017216647e-05,
436
+ "loss": 0.1188,
437
+ "step": 350
438
+ },
439
+ {
440
+ "epoch": 1.78,
441
+ "learning_rate": 2.5466284074605452e-05,
442
+ "loss": 0.1908,
443
+ "step": 355
444
+ },
445
+ {
446
+ "epoch": 1.81,
447
+ "learning_rate": 2.582496413199426e-05,
448
+ "loss": 0.1316,
449
+ "step": 360
450
+ },
451
+ {
452
+ "epoch": 1.83,
453
+ "learning_rate": 2.6183644189383074e-05,
454
+ "loss": 0.2164,
455
+ "step": 365
456
+ },
457
+ {
458
+ "epoch": 1.86,
459
+ "learning_rate": 2.6542324246771883e-05,
460
+ "loss": 0.1956,
461
+ "step": 370
462
+ },
463
+ {
464
+ "epoch": 1.88,
465
+ "learning_rate": 2.6901004304160688e-05,
466
+ "loss": 0.1178,
467
+ "step": 375
468
+ },
469
+ {
470
+ "epoch": 1.91,
471
+ "learning_rate": 2.7259684361549497e-05,
472
+ "loss": 0.2018,
473
+ "step": 380
474
+ },
475
+ {
476
+ "epoch": 1.93,
477
+ "learning_rate": 2.761836441893831e-05,
478
+ "loss": 0.1793,
479
+ "step": 385
480
+ },
481
+ {
482
+ "epoch": 1.96,
483
+ "learning_rate": 2.7977044476327118e-05,
484
+ "loss": 0.1478,
485
+ "step": 390
486
+ },
487
+ {
488
+ "epoch": 1.98,
489
+ "learning_rate": 2.833572453371593e-05,
490
+ "loss": 0.1302,
491
+ "step": 395
492
+ },
493
+ {
494
+ "epoch": 2.0,
495
+ "eval_accuracy": 0.9735973597359736,
496
+ "eval_loss": 0.06645132601261139,
497
+ "eval_runtime": 102.5426,
498
+ "eval_samples_per_second": 20.684,
499
+ "eval_steps_per_second": 2.594,
500
+ "step": 398
501
+ },
502
+ {
503
+ "epoch": 2.01,
504
+ "learning_rate": 2.8694404591104736e-05,
505
+ "loss": 0.137,
506
+ "step": 400
507
+ },
508
+ {
509
+ "epoch": 2.04,
510
+ "learning_rate": 2.9053084648493545e-05,
511
+ "loss": 0.1309,
512
+ "step": 405
513
+ },
514
+ {
515
+ "epoch": 2.06,
516
+ "learning_rate": 2.9411764705882354e-05,
517
+ "loss": 0.1257,
518
+ "step": 410
519
+ },
520
+ {
521
+ "epoch": 2.09,
522
+ "learning_rate": 2.9770444763271166e-05,
523
+ "loss": 0.1457,
524
+ "step": 415
525
+ },
526
+ {
527
+ "epoch": 2.11,
528
+ "learning_rate": 3.012912482065997e-05,
529
+ "loss": 0.259,
530
+ "step": 420
531
+ },
532
+ {
533
+ "epoch": 2.14,
534
+ "learning_rate": 3.048780487804878e-05,
535
+ "loss": 0.1984,
536
+ "step": 425
537
+ },
538
+ {
539
+ "epoch": 2.16,
540
+ "learning_rate": 3.084648493543759e-05,
541
+ "loss": 0.1493,
542
+ "step": 430
543
+ },
544
+ {
545
+ "epoch": 2.19,
546
+ "learning_rate": 3.1205164992826405e-05,
547
+ "loss": 0.1391,
548
+ "step": 435
549
+ },
550
+ {
551
+ "epoch": 2.21,
552
+ "learning_rate": 3.156384505021521e-05,
553
+ "loss": 0.1312,
554
+ "step": 440
555
+ },
556
+ {
557
+ "epoch": 2.24,
558
+ "learning_rate": 3.1922525107604016e-05,
559
+ "loss": 0.1164,
560
+ "step": 445
561
+ },
562
+ {
563
+ "epoch": 2.26,
564
+ "learning_rate": 3.228120516499283e-05,
565
+ "loss": 0.1189,
566
+ "step": 450
567
+ },
568
+ {
569
+ "epoch": 2.29,
570
+ "learning_rate": 3.263988522238164e-05,
571
+ "loss": 0.1296,
572
+ "step": 455
573
+ },
574
+ {
575
+ "epoch": 2.31,
576
+ "learning_rate": 3.2998565279770446e-05,
577
+ "loss": 0.078,
578
+ "step": 460
579
+ },
580
+ {
581
+ "epoch": 2.34,
582
+ "learning_rate": 3.335724533715925e-05,
583
+ "loss": 0.0943,
584
+ "step": 465
585
+ },
586
+ {
587
+ "epoch": 2.36,
588
+ "learning_rate": 3.3715925394548064e-05,
589
+ "loss": 0.0932,
590
+ "step": 470
591
+ },
592
+ {
593
+ "epoch": 2.39,
594
+ "learning_rate": 3.4074605451936876e-05,
595
+ "loss": 0.1722,
596
+ "step": 475
597
+ },
598
+ {
599
+ "epoch": 2.41,
600
+ "learning_rate": 3.443328550932569e-05,
601
+ "loss": 0.2213,
602
+ "step": 480
603
+ },
604
+ {
605
+ "epoch": 2.44,
606
+ "learning_rate": 3.479196556671449e-05,
607
+ "loss": 0.1573,
608
+ "step": 485
609
+ },
610
+ {
611
+ "epoch": 2.46,
612
+ "learning_rate": 3.51506456241033e-05,
613
+ "loss": 0.2181,
614
+ "step": 490
615
+ },
616
+ {
617
+ "epoch": 2.49,
618
+ "learning_rate": 3.550932568149211e-05,
619
+ "loss": 0.1229,
620
+ "step": 495
621
+ },
622
+ {
623
+ "epoch": 2.51,
624
+ "learning_rate": 3.5868005738880924e-05,
625
+ "loss": 0.0817,
626
+ "step": 500
627
+ },
628
+ {
629
+ "epoch": 2.54,
630
+ "learning_rate": 3.622668579626973e-05,
631
+ "loss": 0.0617,
632
+ "step": 505
633
+ },
634
+ {
635
+ "epoch": 2.56,
636
+ "learning_rate": 3.6585365853658535e-05,
637
+ "loss": 0.2366,
638
+ "step": 510
639
+ },
640
+ {
641
+ "epoch": 2.59,
642
+ "learning_rate": 3.694404591104735e-05,
643
+ "loss": 0.1142,
644
+ "step": 515
645
+ },
646
+ {
647
+ "epoch": 2.61,
648
+ "learning_rate": 3.730272596843616e-05,
649
+ "loss": 0.1394,
650
+ "step": 520
651
+ },
652
+ {
653
+ "epoch": 2.64,
654
+ "learning_rate": 3.7661406025824965e-05,
655
+ "loss": 0.1163,
656
+ "step": 525
657
+ },
658
+ {
659
+ "epoch": 2.66,
660
+ "learning_rate": 3.802008608321377e-05,
661
+ "loss": 0.1146,
662
+ "step": 530
663
+ },
664
+ {
665
+ "epoch": 2.69,
666
+ "learning_rate": 3.837876614060258e-05,
667
+ "loss": 0.1014,
668
+ "step": 535
669
+ },
670
+ {
671
+ "epoch": 2.71,
672
+ "learning_rate": 3.8737446197991395e-05,
673
+ "loss": 0.0707,
674
+ "step": 540
675
+ },
676
+ {
677
+ "epoch": 2.74,
678
+ "learning_rate": 3.909612625538021e-05,
679
+ "loss": 0.1651,
680
+ "step": 545
681
+ },
682
+ {
683
+ "epoch": 2.76,
684
+ "learning_rate": 3.945480631276901e-05,
685
+ "loss": 0.3627,
686
+ "step": 550
687
+ },
688
+ {
689
+ "epoch": 2.79,
690
+ "learning_rate": 3.981348637015782e-05,
691
+ "loss": 0.1615,
692
+ "step": 555
693
+ },
694
+ {
695
+ "epoch": 2.81,
696
+ "learning_rate": 4.017216642754663e-05,
697
+ "loss": 0.2595,
698
+ "step": 560
699
+ },
700
+ {
701
+ "epoch": 2.84,
702
+ "learning_rate": 4.053084648493544e-05,
703
+ "loss": 0.1507,
704
+ "step": 565
705
+ },
706
+ {
707
+ "epoch": 2.86,
708
+ "learning_rate": 4.088952654232425e-05,
709
+ "loss": 0.1336,
710
+ "step": 570
711
+ },
712
+ {
713
+ "epoch": 2.89,
714
+ "learning_rate": 4.1248206599713054e-05,
715
+ "loss": 0.093,
716
+ "step": 575
717
+ },
718
+ {
719
+ "epoch": 2.91,
720
+ "learning_rate": 4.160688665710187e-05,
721
+ "loss": 0.1895,
722
+ "step": 580
723
+ },
724
+ {
725
+ "epoch": 2.94,
726
+ "learning_rate": 4.196556671449068e-05,
727
+ "loss": 0.1971,
728
+ "step": 585
729
+ },
730
+ {
731
+ "epoch": 2.96,
732
+ "learning_rate": 4.2324246771879484e-05,
733
+ "loss": 0.1536,
734
+ "step": 590
735
+ },
736
+ {
737
+ "epoch": 2.99,
738
+ "learning_rate": 4.26829268292683e-05,
739
+ "loss": 0.0784,
740
+ "step": 595
741
+ },
742
+ {
743
+ "epoch": 3.0,
744
+ "eval_accuracy": 0.9778406412069779,
745
+ "eval_loss": 0.05999346077442169,
746
+ "eval_runtime": 102.3498,
747
+ "eval_samples_per_second": 20.723,
748
+ "eval_steps_per_second": 2.599,
749
+ "step": 597
750
+ },
751
+ {
752
+ "epoch": 3.02,
753
+ "learning_rate": 4.30416068866571e-05,
754
+ "loss": 0.1106,
755
+ "step": 600
756
+ },
757
+ {
758
+ "epoch": 3.04,
759
+ "learning_rate": 4.3400286944045915e-05,
760
+ "loss": 0.1983,
761
+ "step": 605
762
+ },
763
+ {
764
+ "epoch": 3.07,
765
+ "learning_rate": 4.375896700143472e-05,
766
+ "loss": 0.1241,
767
+ "step": 610
768
+ },
769
+ {
770
+ "epoch": 3.09,
771
+ "learning_rate": 4.411764705882353e-05,
772
+ "loss": 0.1141,
773
+ "step": 615
774
+ },
775
+ {
776
+ "epoch": 3.12,
777
+ "learning_rate": 4.447632711621234e-05,
778
+ "loss": 0.1416,
779
+ "step": 620
780
+ },
781
+ {
782
+ "epoch": 3.14,
783
+ "learning_rate": 4.483500717360115e-05,
784
+ "loss": 0.1494,
785
+ "step": 625
786
+ },
787
+ {
788
+ "epoch": 3.17,
789
+ "learning_rate": 4.519368723098996e-05,
790
+ "loss": 0.0762,
791
+ "step": 630
792
+ },
793
+ {
794
+ "epoch": 3.19,
795
+ "learning_rate": 4.555236728837877e-05,
796
+ "loss": 0.1448,
797
+ "step": 635
798
+ },
799
+ {
800
+ "epoch": 3.22,
801
+ "learning_rate": 4.591104734576758e-05,
802
+ "loss": 0.1665,
803
+ "step": 640
804
+ },
805
+ {
806
+ "epoch": 3.24,
807
+ "learning_rate": 4.6269727403156386e-05,
808
+ "loss": 0.0737,
809
+ "step": 645
810
+ },
811
+ {
812
+ "epoch": 3.27,
813
+ "learning_rate": 4.66284074605452e-05,
814
+ "loss": 0.0924,
815
+ "step": 650
816
+ },
817
+ {
818
+ "epoch": 3.29,
819
+ "learning_rate": 4.6987087517934004e-05,
820
+ "loss": 0.1181,
821
+ "step": 655
822
+ },
823
+ {
824
+ "epoch": 3.32,
825
+ "learning_rate": 4.7345767575322816e-05,
826
+ "loss": 0.2936,
827
+ "step": 660
828
+ },
829
+ {
830
+ "epoch": 3.34,
831
+ "learning_rate": 4.770444763271162e-05,
832
+ "loss": 0.1374,
833
+ "step": 665
834
+ },
835
+ {
836
+ "epoch": 3.37,
837
+ "learning_rate": 4.8063127690100434e-05,
838
+ "loss": 0.1504,
839
+ "step": 670
840
+ },
841
+ {
842
+ "epoch": 3.39,
843
+ "learning_rate": 4.842180774748924e-05,
844
+ "loss": 0.2001,
845
+ "step": 675
846
+ },
847
+ {
848
+ "epoch": 3.42,
849
+ "learning_rate": 4.878048780487805e-05,
850
+ "loss": 0.0628,
851
+ "step": 680
852
+ },
853
+ {
854
+ "epoch": 3.44,
855
+ "learning_rate": 4.9139167862266864e-05,
856
+ "loss": 0.116,
857
+ "step": 685
858
+ },
859
+ {
860
+ "epoch": 3.47,
861
+ "learning_rate": 4.949784791965567e-05,
862
+ "loss": 0.1345,
863
+ "step": 690
864
+ },
865
+ {
866
+ "epoch": 3.49,
867
+ "learning_rate": 4.9856527977044475e-05,
868
+ "loss": 0.119,
869
+ "step": 695
870
+ },
871
+ {
872
+ "epoch": 3.52,
873
+ "learning_rate": 4.978448275862069e-05,
874
+ "loss": 0.0882,
875
+ "step": 700
876
+ },
877
+ {
878
+ "epoch": 3.54,
879
+ "learning_rate": 4.9425287356321845e-05,
880
+ "loss": 0.0853,
881
+ "step": 705
882
+ },
883
+ {
884
+ "epoch": 3.57,
885
+ "learning_rate": 4.906609195402299e-05,
886
+ "loss": 0.2363,
887
+ "step": 710
888
+ },
889
+ {
890
+ "epoch": 3.59,
891
+ "learning_rate": 4.870689655172414e-05,
892
+ "loss": 0.1878,
893
+ "step": 715
894
+ },
895
+ {
896
+ "epoch": 3.62,
897
+ "learning_rate": 4.834770114942529e-05,
898
+ "loss": 0.0954,
899
+ "step": 720
900
+ },
901
+ {
902
+ "epoch": 3.64,
903
+ "learning_rate": 4.798850574712644e-05,
904
+ "loss": 0.1891,
905
+ "step": 725
906
+ },
907
+ {
908
+ "epoch": 3.67,
909
+ "learning_rate": 4.762931034482759e-05,
910
+ "loss": 0.1562,
911
+ "step": 730
912
+ },
913
+ {
914
+ "epoch": 3.69,
915
+ "learning_rate": 4.7270114942528734e-05,
916
+ "loss": 0.1207,
917
+ "step": 735
918
+ },
919
+ {
920
+ "epoch": 3.72,
921
+ "learning_rate": 4.6910919540229886e-05,
922
+ "loss": 0.096,
923
+ "step": 740
924
+ },
925
+ {
926
+ "epoch": 3.74,
927
+ "learning_rate": 4.655172413793104e-05,
928
+ "loss": 0.1072,
929
+ "step": 745
930
+ },
931
+ {
932
+ "epoch": 3.77,
933
+ "learning_rate": 4.619252873563218e-05,
934
+ "loss": 0.0718,
935
+ "step": 750
936
+ },
937
+ {
938
+ "epoch": 3.79,
939
+ "learning_rate": 4.5833333333333334e-05,
940
+ "loss": 0.0622,
941
+ "step": 755
942
+ },
943
+ {
944
+ "epoch": 3.82,
945
+ "learning_rate": 4.5474137931034485e-05,
946
+ "loss": 0.1661,
947
+ "step": 760
948
+ },
949
+ {
950
+ "epoch": 3.84,
951
+ "learning_rate": 4.511494252873563e-05,
952
+ "loss": 0.1559,
953
+ "step": 765
954
+ },
955
+ {
956
+ "epoch": 3.87,
957
+ "learning_rate": 4.475574712643678e-05,
958
+ "loss": 0.1195,
959
+ "step": 770
960
+ },
961
+ {
962
+ "epoch": 3.89,
963
+ "learning_rate": 4.4396551724137933e-05,
964
+ "loss": 0.105,
965
+ "step": 775
966
+ },
967
+ {
968
+ "epoch": 3.92,
969
+ "learning_rate": 4.4037356321839085e-05,
970
+ "loss": 0.0764,
971
+ "step": 780
972
+ },
973
+ {
974
+ "epoch": 3.94,
975
+ "learning_rate": 4.367816091954024e-05,
976
+ "loss": 0.1322,
977
+ "step": 785
978
+ },
979
+ {
980
+ "epoch": 3.97,
981
+ "learning_rate": 4.331896551724138e-05,
982
+ "loss": 0.0829,
983
+ "step": 790
984
+ },
985
+ {
986
+ "epoch": 3.99,
987
+ "learning_rate": 4.295977011494253e-05,
988
+ "loss": 0.1181,
989
+ "step": 795
990
+ },
991
+ {
992
+ "epoch": 4.0,
993
+ "eval_accuracy": 0.9849127769919849,
994
+ "eval_loss": 0.04494262859225273,
995
+ "eval_runtime": 102.4081,
996
+ "eval_samples_per_second": 20.711,
997
+ "eval_steps_per_second": 2.597,
998
+ "step": 796
999
+ },
1000
+ {
1001
+ "epoch": 4.02,
1002
+ "learning_rate": 4.2600574712643685e-05,
1003
+ "loss": 0.1042,
1004
+ "step": 800
1005
+ },
1006
+ {
1007
+ "epoch": 4.05,
1008
+ "learning_rate": 4.224137931034483e-05,
1009
+ "loss": 0.0089,
1010
+ "step": 805
1011
+ },
1012
+ {
1013
+ "epoch": 4.07,
1014
+ "learning_rate": 4.188218390804598e-05,
1015
+ "loss": 0.1135,
1016
+ "step": 810
1017
+ },
1018
+ {
1019
+ "epoch": 4.1,
1020
+ "learning_rate": 4.1522988505747126e-05,
1021
+ "loss": 0.0473,
1022
+ "step": 815
1023
+ },
1024
+ {
1025
+ "epoch": 4.12,
1026
+ "learning_rate": 4.116379310344828e-05,
1027
+ "loss": 0.2587,
1028
+ "step": 820
1029
+ },
1030
+ {
1031
+ "epoch": 4.15,
1032
+ "learning_rate": 4.080459770114943e-05,
1033
+ "loss": 0.1912,
1034
+ "step": 825
1035
+ },
1036
+ {
1037
+ "epoch": 4.17,
1038
+ "learning_rate": 4.0445402298850574e-05,
1039
+ "loss": 0.1441,
1040
+ "step": 830
1041
+ },
1042
+ {
1043
+ "epoch": 4.2,
1044
+ "learning_rate": 4.0086206896551726e-05,
1045
+ "loss": 0.0334,
1046
+ "step": 835
1047
+ },
1048
+ {
1049
+ "epoch": 4.22,
1050
+ "learning_rate": 3.972701149425288e-05,
1051
+ "loss": 0.0537,
1052
+ "step": 840
1053
+ },
1054
+ {
1055
+ "epoch": 4.25,
1056
+ "learning_rate": 3.936781609195402e-05,
1057
+ "loss": 0.1477,
1058
+ "step": 845
1059
+ },
1060
+ {
1061
+ "epoch": 4.27,
1062
+ "learning_rate": 3.9008620689655174e-05,
1063
+ "loss": 0.0594,
1064
+ "step": 850
1065
+ },
1066
+ {
1067
+ "epoch": 4.3,
1068
+ "learning_rate": 3.8649425287356325e-05,
1069
+ "loss": 0.0833,
1070
+ "step": 855
1071
+ },
1072
+ {
1073
+ "epoch": 4.32,
1074
+ "learning_rate": 3.829022988505747e-05,
1075
+ "loss": 0.1532,
1076
+ "step": 860
1077
+ },
1078
+ {
1079
+ "epoch": 4.35,
1080
+ "learning_rate": 3.793103448275862e-05,
1081
+ "loss": 0.1087,
1082
+ "step": 865
1083
+ },
1084
+ {
1085
+ "epoch": 4.37,
1086
+ "learning_rate": 3.7571839080459766e-05,
1087
+ "loss": 0.1735,
1088
+ "step": 870
1089
+ },
1090
+ {
1091
+ "epoch": 4.4,
1092
+ "learning_rate": 3.721264367816092e-05,
1093
+ "loss": 0.1434,
1094
+ "step": 875
1095
+ },
1096
+ {
1097
+ "epoch": 4.42,
1098
+ "learning_rate": 3.685344827586207e-05,
1099
+ "loss": 0.1227,
1100
+ "step": 880
1101
+ },
1102
+ {
1103
+ "epoch": 4.45,
1104
+ "learning_rate": 3.649425287356322e-05,
1105
+ "loss": 0.1714,
1106
+ "step": 885
1107
+ },
1108
+ {
1109
+ "epoch": 4.47,
1110
+ "learning_rate": 3.613505747126437e-05,
1111
+ "loss": 0.1588,
1112
+ "step": 890
1113
+ },
1114
+ {
1115
+ "epoch": 4.5,
1116
+ "learning_rate": 3.5775862068965524e-05,
1117
+ "loss": 0.1283,
1118
+ "step": 895
1119
+ },
1120
+ {
1121
+ "epoch": 4.52,
1122
+ "learning_rate": 3.541666666666667e-05,
1123
+ "loss": 0.0578,
1124
+ "step": 900
1125
+ },
1126
+ {
1127
+ "epoch": 4.55,
1128
+ "learning_rate": 3.505747126436782e-05,
1129
+ "loss": 0.119,
1130
+ "step": 905
1131
+ },
1132
+ {
1133
+ "epoch": 4.57,
1134
+ "learning_rate": 3.4698275862068966e-05,
1135
+ "loss": 0.0986,
1136
+ "step": 910
1137
+ },
1138
+ {
1139
+ "epoch": 4.6,
1140
+ "learning_rate": 3.433908045977012e-05,
1141
+ "loss": 0.0577,
1142
+ "step": 915
1143
+ },
1144
+ {
1145
+ "epoch": 4.62,
1146
+ "learning_rate": 3.397988505747127e-05,
1147
+ "loss": 0.0148,
1148
+ "step": 920
1149
+ },
1150
+ {
1151
+ "epoch": 4.65,
1152
+ "learning_rate": 3.3620689655172414e-05,
1153
+ "loss": 0.0293,
1154
+ "step": 925
1155
+ },
1156
+ {
1157
+ "epoch": 4.67,
1158
+ "learning_rate": 3.3261494252873565e-05,
1159
+ "loss": 0.0276,
1160
+ "step": 930
1161
+ },
1162
+ {
1163
+ "epoch": 4.7,
1164
+ "learning_rate": 3.290229885057472e-05,
1165
+ "loss": 0.0033,
1166
+ "step": 935
1167
+ },
1168
+ {
1169
+ "epoch": 4.72,
1170
+ "learning_rate": 3.254310344827586e-05,
1171
+ "loss": 0.0977,
1172
+ "step": 940
1173
+ },
1174
+ {
1175
+ "epoch": 4.75,
1176
+ "learning_rate": 3.218390804597701e-05,
1177
+ "loss": 0.0749,
1178
+ "step": 945
1179
+ },
1180
+ {
1181
+ "epoch": 4.77,
1182
+ "learning_rate": 3.1824712643678165e-05,
1183
+ "loss": 0.0323,
1184
+ "step": 950
1185
+ },
1186
+ {
1187
+ "epoch": 4.8,
1188
+ "learning_rate": 3.146551724137931e-05,
1189
+ "loss": 0.1322,
1190
+ "step": 955
1191
+ },
1192
+ {
1193
+ "epoch": 4.82,
1194
+ "learning_rate": 3.110632183908046e-05,
1195
+ "loss": 0.081,
1196
+ "step": 960
1197
+ },
1198
+ {
1199
+ "epoch": 4.85,
1200
+ "learning_rate": 3.0747126436781606e-05,
1201
+ "loss": 0.1045,
1202
+ "step": 965
1203
+ },
1204
+ {
1205
+ "epoch": 4.87,
1206
+ "learning_rate": 3.0387931034482758e-05,
1207
+ "loss": 0.1323,
1208
+ "step": 970
1209
+ },
1210
+ {
1211
+ "epoch": 4.9,
1212
+ "learning_rate": 3.0028735632183906e-05,
1213
+ "loss": 0.2384,
1214
+ "step": 975
1215
+ },
1216
+ {
1217
+ "epoch": 4.92,
1218
+ "learning_rate": 2.9669540229885058e-05,
1219
+ "loss": 0.0663,
1220
+ "step": 980
1221
+ },
1222
+ {
1223
+ "epoch": 4.95,
1224
+ "learning_rate": 2.9310344827586206e-05,
1225
+ "loss": 0.0836,
1226
+ "step": 985
1227
+ },
1228
+ {
1229
+ "epoch": 4.97,
1230
+ "learning_rate": 2.8951149425287354e-05,
1231
+ "loss": 0.1856,
1232
+ "step": 990
1233
+ },
1234
+ {
1235
+ "epoch": 5.0,
1236
+ "learning_rate": 2.859195402298851e-05,
1237
+ "loss": 0.208,
1238
+ "step": 995
1239
+ },
1240
+ {
1241
+ "epoch": 5.0,
1242
+ "eval_accuracy": 0.9886845827439887,
1243
+ "eval_loss": 0.03925405070185661,
1244
+ "eval_runtime": 102.5066,
1245
+ "eval_samples_per_second": 20.691,
1246
+ "eval_steps_per_second": 2.595,
1247
+ "step": 995
1248
+ },
1249
+ {
1250
+ "epoch": 5.03,
1251
+ "learning_rate": 2.8232758620689657e-05,
1252
+ "loss": 0.0379,
1253
+ "step": 1000
1254
+ },
1255
+ {
1256
+ "epoch": 5.05,
1257
+ "learning_rate": 2.787356321839081e-05,
1258
+ "loss": 0.1243,
1259
+ "step": 1005
1260
+ },
1261
+ {
1262
+ "epoch": 5.08,
1263
+ "learning_rate": 2.7514367816091957e-05,
1264
+ "loss": 0.0983,
1265
+ "step": 1010
1266
+ },
1267
+ {
1268
+ "epoch": 5.1,
1269
+ "learning_rate": 2.7155172413793105e-05,
1270
+ "loss": 0.0448,
1271
+ "step": 1015
1272
+ },
1273
+ {
1274
+ "epoch": 5.13,
1275
+ "learning_rate": 2.6795977011494257e-05,
1276
+ "loss": 0.1149,
1277
+ "step": 1020
1278
+ },
1279
+ {
1280
+ "epoch": 5.15,
1281
+ "learning_rate": 2.6436781609195405e-05,
1282
+ "loss": 0.0619,
1283
+ "step": 1025
1284
+ },
1285
+ {
1286
+ "epoch": 5.18,
1287
+ "learning_rate": 2.6077586206896553e-05,
1288
+ "loss": 0.1441,
1289
+ "step": 1030
1290
+ },
1291
+ {
1292
+ "epoch": 5.2,
1293
+ "learning_rate": 2.57183908045977e-05,
1294
+ "loss": 0.0575,
1295
+ "step": 1035
1296
+ },
1297
+ {
1298
+ "epoch": 5.23,
1299
+ "learning_rate": 2.5359195402298853e-05,
1300
+ "loss": 0.067,
1301
+ "step": 1040
1302
+ },
1303
+ {
1304
+ "epoch": 5.25,
1305
+ "learning_rate": 2.5e-05,
1306
+ "loss": 0.0616,
1307
+ "step": 1045
1308
+ },
1309
+ {
1310
+ "epoch": 5.28,
1311
+ "learning_rate": 2.464080459770115e-05,
1312
+ "loss": 0.0797,
1313
+ "step": 1050
1314
+ },
1315
+ {
1316
+ "epoch": 5.3,
1317
+ "learning_rate": 2.42816091954023e-05,
1318
+ "loss": 0.043,
1319
+ "step": 1055
1320
+ },
1321
+ {
1322
+ "epoch": 5.33,
1323
+ "learning_rate": 2.392241379310345e-05,
1324
+ "loss": 0.0809,
1325
+ "step": 1060
1326
+ },
1327
+ {
1328
+ "epoch": 5.35,
1329
+ "learning_rate": 2.3563218390804597e-05,
1330
+ "loss": 0.0548,
1331
+ "step": 1065
1332
+ },
1333
+ {
1334
+ "epoch": 5.38,
1335
+ "learning_rate": 2.3204022988505746e-05,
1336
+ "loss": 0.0662,
1337
+ "step": 1070
1338
+ },
1339
+ {
1340
+ "epoch": 5.4,
1341
+ "learning_rate": 2.2844827586206897e-05,
1342
+ "loss": 0.1148,
1343
+ "step": 1075
1344
+ },
1345
+ {
1346
+ "epoch": 5.43,
1347
+ "learning_rate": 2.248563218390805e-05,
1348
+ "loss": 0.0548,
1349
+ "step": 1080
1350
+ },
1351
+ {
1352
+ "epoch": 5.45,
1353
+ "learning_rate": 2.2126436781609197e-05,
1354
+ "loss": 0.0658,
1355
+ "step": 1085
1356
+ },
1357
+ {
1358
+ "epoch": 5.48,
1359
+ "learning_rate": 2.1767241379310345e-05,
1360
+ "loss": 0.0686,
1361
+ "step": 1090
1362
+ },
1363
+ {
1364
+ "epoch": 5.5,
1365
+ "learning_rate": 2.1408045977011497e-05,
1366
+ "loss": 0.1114,
1367
+ "step": 1095
1368
+ },
1369
+ {
1370
+ "epoch": 5.53,
1371
+ "learning_rate": 2.1048850574712645e-05,
1372
+ "loss": 0.0262,
1373
+ "step": 1100
1374
+ },
1375
+ {
1376
+ "epoch": 5.55,
1377
+ "learning_rate": 2.0689655172413793e-05,
1378
+ "loss": 0.0745,
1379
+ "step": 1105
1380
+ },
1381
+ {
1382
+ "epoch": 5.58,
1383
+ "learning_rate": 2.033045977011494e-05,
1384
+ "loss": 0.1139,
1385
+ "step": 1110
1386
+ },
1387
+ {
1388
+ "epoch": 5.6,
1389
+ "learning_rate": 1.9971264367816093e-05,
1390
+ "loss": 0.1325,
1391
+ "step": 1115
1392
+ },
1393
+ {
1394
+ "epoch": 5.63,
1395
+ "learning_rate": 1.961206896551724e-05,
1396
+ "loss": 0.0565,
1397
+ "step": 1120
1398
+ },
1399
+ {
1400
+ "epoch": 5.65,
1401
+ "learning_rate": 1.925287356321839e-05,
1402
+ "loss": 0.0274,
1403
+ "step": 1125
1404
+ },
1405
+ {
1406
+ "epoch": 5.68,
1407
+ "learning_rate": 1.889367816091954e-05,
1408
+ "loss": 0.1295,
1409
+ "step": 1130
1410
+ },
1411
+ {
1412
+ "epoch": 5.7,
1413
+ "learning_rate": 1.8534482758620693e-05,
1414
+ "loss": 0.0501,
1415
+ "step": 1135
1416
+ },
1417
+ {
1418
+ "epoch": 5.73,
1419
+ "learning_rate": 1.817528735632184e-05,
1420
+ "loss": 0.0586,
1421
+ "step": 1140
1422
+ },
1423
+ {
1424
+ "epoch": 5.75,
1425
+ "learning_rate": 1.781609195402299e-05,
1426
+ "loss": 0.045,
1427
+ "step": 1145
1428
+ },
1429
+ {
1430
+ "epoch": 5.78,
1431
+ "learning_rate": 1.7456896551724137e-05,
1432
+ "loss": 0.0226,
1433
+ "step": 1150
1434
+ },
1435
+ {
1436
+ "epoch": 5.8,
1437
+ "learning_rate": 1.709770114942529e-05,
1438
+ "loss": 0.0166,
1439
+ "step": 1155
1440
+ },
1441
+ {
1442
+ "epoch": 5.83,
1443
+ "learning_rate": 1.6738505747126437e-05,
1444
+ "loss": 0.1241,
1445
+ "step": 1160
1446
+ },
1447
+ {
1448
+ "epoch": 5.85,
1449
+ "learning_rate": 1.6379310344827585e-05,
1450
+ "loss": 0.0243,
1451
+ "step": 1165
1452
+ },
1453
+ {
1454
+ "epoch": 5.88,
1455
+ "learning_rate": 1.6020114942528737e-05,
1456
+ "loss": 0.0681,
1457
+ "step": 1170
1458
+ },
1459
+ {
1460
+ "epoch": 5.9,
1461
+ "learning_rate": 1.5660919540229885e-05,
1462
+ "loss": 0.0099,
1463
+ "step": 1175
1464
+ },
1465
+ {
1466
+ "epoch": 5.93,
1467
+ "learning_rate": 1.5301724137931033e-05,
1468
+ "loss": 0.0196,
1469
+ "step": 1180
1470
+ },
1471
+ {
1472
+ "epoch": 5.95,
1473
+ "learning_rate": 1.4942528735632185e-05,
1474
+ "loss": 0.1818,
1475
+ "step": 1185
1476
+ },
1477
+ {
1478
+ "epoch": 5.98,
1479
+ "learning_rate": 1.4583333333333335e-05,
1480
+ "loss": 0.0057,
1481
+ "step": 1190
1482
+ },
1483
+ {
1484
+ "epoch": 6.0,
1485
+ "eval_accuracy": 0.9943422913719944,
1486
+ "eval_loss": 0.022870095446705818,
1487
+ "eval_runtime": 102.5148,
1488
+ "eval_samples_per_second": 20.69,
1489
+ "eval_steps_per_second": 2.595,
1490
+ "step": 1194
1491
+ },
1492
+ {
1493
+ "epoch": 6.01,
1494
+ "learning_rate": 1.4224137931034485e-05,
1495
+ "loss": 0.0652,
1496
+ "step": 1195
1497
+ },
1498
+ {
1499
+ "epoch": 6.03,
1500
+ "learning_rate": 1.3864942528735633e-05,
1501
+ "loss": 0.0672,
1502
+ "step": 1200
1503
+ },
1504
+ {
1505
+ "epoch": 6.06,
1506
+ "learning_rate": 1.3505747126436783e-05,
1507
+ "loss": 0.0322,
1508
+ "step": 1205
1509
+ },
1510
+ {
1511
+ "epoch": 6.08,
1512
+ "learning_rate": 1.3146551724137931e-05,
1513
+ "loss": 0.0374,
1514
+ "step": 1210
1515
+ },
1516
+ {
1517
+ "epoch": 6.11,
1518
+ "learning_rate": 1.2787356321839081e-05,
1519
+ "loss": 0.0112,
1520
+ "step": 1215
1521
+ },
1522
+ {
1523
+ "epoch": 6.13,
1524
+ "learning_rate": 1.242816091954023e-05,
1525
+ "loss": 0.0284,
1526
+ "step": 1220
1527
+ },
1528
+ {
1529
+ "epoch": 6.16,
1530
+ "learning_rate": 1.206896551724138e-05,
1531
+ "loss": 0.0688,
1532
+ "step": 1225
1533
+ },
1534
+ {
1535
+ "epoch": 6.18,
1536
+ "learning_rate": 1.1709770114942529e-05,
1537
+ "loss": 0.0271,
1538
+ "step": 1230
1539
+ },
1540
+ {
1541
+ "epoch": 6.21,
1542
+ "learning_rate": 1.1350574712643679e-05,
1543
+ "loss": 0.0048,
1544
+ "step": 1235
1545
+ },
1546
+ {
1547
+ "epoch": 6.23,
1548
+ "learning_rate": 1.0991379310344827e-05,
1549
+ "loss": 0.0699,
1550
+ "step": 1240
1551
+ },
1552
+ {
1553
+ "epoch": 6.26,
1554
+ "learning_rate": 1.0632183908045977e-05,
1555
+ "loss": 0.0951,
1556
+ "step": 1245
1557
+ },
1558
+ {
1559
+ "epoch": 6.28,
1560
+ "learning_rate": 1.0272988505747127e-05,
1561
+ "loss": 0.0334,
1562
+ "step": 1250
1563
+ },
1564
+ {
1565
+ "epoch": 6.31,
1566
+ "learning_rate": 9.913793103448277e-06,
1567
+ "loss": 0.061,
1568
+ "step": 1255
1569
+ },
1570
+ {
1571
+ "epoch": 6.33,
1572
+ "learning_rate": 9.554597701149425e-06,
1573
+ "loss": 0.0025,
1574
+ "step": 1260
1575
+ },
1576
+ {
1577
+ "epoch": 6.36,
1578
+ "learning_rate": 9.195402298850575e-06,
1579
+ "loss": 0.0532,
1580
+ "step": 1265
1581
+ },
1582
+ {
1583
+ "epoch": 6.38,
1584
+ "learning_rate": 8.836206896551725e-06,
1585
+ "loss": 0.0359,
1586
+ "step": 1270
1587
+ },
1588
+ {
1589
+ "epoch": 6.41,
1590
+ "learning_rate": 8.477011494252873e-06,
1591
+ "loss": 0.0034,
1592
+ "step": 1275
1593
+ },
1594
+ {
1595
+ "epoch": 6.43,
1596
+ "learning_rate": 8.117816091954025e-06,
1597
+ "loss": 0.0309,
1598
+ "step": 1280
1599
+ },
1600
+ {
1601
+ "epoch": 6.46,
1602
+ "learning_rate": 7.758620689655173e-06,
1603
+ "loss": 0.012,
1604
+ "step": 1285
1605
+ },
1606
+ {
1607
+ "epoch": 6.48,
1608
+ "learning_rate": 7.399425287356322e-06,
1609
+ "loss": 0.0651,
1610
+ "step": 1290
1611
+ },
1612
+ {
1613
+ "epoch": 6.51,
1614
+ "learning_rate": 7.040229885057471e-06,
1615
+ "loss": 0.0261,
1616
+ "step": 1295
1617
+ },
1618
+ {
1619
+ "epoch": 6.53,
1620
+ "learning_rate": 6.68103448275862e-06,
1621
+ "loss": 0.0366,
1622
+ "step": 1300
1623
+ },
1624
+ {
1625
+ "epoch": 6.56,
1626
+ "learning_rate": 6.321839080459771e-06,
1627
+ "loss": 0.0095,
1628
+ "step": 1305
1629
+ },
1630
+ {
1631
+ "epoch": 6.58,
1632
+ "learning_rate": 5.96264367816092e-06,
1633
+ "loss": 0.0244,
1634
+ "step": 1310
1635
+ },
1636
+ {
1637
+ "epoch": 6.61,
1638
+ "learning_rate": 5.603448275862069e-06,
1639
+ "loss": 0.0133,
1640
+ "step": 1315
1641
+ },
1642
+ {
1643
+ "epoch": 6.63,
1644
+ "learning_rate": 5.244252873563219e-06,
1645
+ "loss": 0.0611,
1646
+ "step": 1320
1647
+ },
1648
+ {
1649
+ "epoch": 6.66,
1650
+ "learning_rate": 4.885057471264369e-06,
1651
+ "loss": 0.0558,
1652
+ "step": 1325
1653
+ },
1654
+ {
1655
+ "epoch": 6.68,
1656
+ "learning_rate": 4.525862068965518e-06,
1657
+ "loss": 0.0074,
1658
+ "step": 1330
1659
+ },
1660
+ {
1661
+ "epoch": 6.71,
1662
+ "learning_rate": 4.166666666666667e-06,
1663
+ "loss": 0.0219,
1664
+ "step": 1335
1665
+ },
1666
+ {
1667
+ "epoch": 6.73,
1668
+ "learning_rate": 3.8074712643678163e-06,
1669
+ "loss": 0.0106,
1670
+ "step": 1340
1671
+ },
1672
+ {
1673
+ "epoch": 6.76,
1674
+ "learning_rate": 3.448275862068966e-06,
1675
+ "loss": 0.0009,
1676
+ "step": 1345
1677
+ },
1678
+ {
1679
+ "epoch": 6.78,
1680
+ "learning_rate": 3.0890804597701153e-06,
1681
+ "loss": 0.0028,
1682
+ "step": 1350
1683
+ },
1684
+ {
1685
+ "epoch": 6.81,
1686
+ "learning_rate": 2.729885057471265e-06,
1687
+ "loss": 0.018,
1688
+ "step": 1355
1689
+ },
1690
+ {
1691
+ "epoch": 6.83,
1692
+ "learning_rate": 2.370689655172414e-06,
1693
+ "loss": 0.0383,
1694
+ "step": 1360
1695
+ },
1696
+ {
1697
+ "epoch": 6.86,
1698
+ "learning_rate": 2.0114942528735633e-06,
1699
+ "loss": 0.0463,
1700
+ "step": 1365
1701
+ },
1702
+ {
1703
+ "epoch": 6.88,
1704
+ "learning_rate": 1.6522988505747128e-06,
1705
+ "loss": 0.0113,
1706
+ "step": 1370
1707
+ },
1708
+ {
1709
+ "epoch": 6.91,
1710
+ "learning_rate": 1.293103448275862e-06,
1711
+ "loss": 0.0684,
1712
+ "step": 1375
1713
+ },
1714
+ {
1715
+ "epoch": 6.93,
1716
+ "learning_rate": 9.339080459770116e-07,
1717
+ "loss": 0.0319,
1718
+ "step": 1380
1719
+ },
1720
+ {
1721
+ "epoch": 6.96,
1722
+ "learning_rate": 5.747126436781609e-07,
1723
+ "loss": 0.0689,
1724
+ "step": 1385
1725
+ },
1726
+ {
1727
+ "epoch": 6.98,
1728
+ "learning_rate": 2.1551724137931036e-07,
1729
+ "loss": 0.0017,
1730
+ "step": 1390
1731
+ },
1732
+ {
1733
+ "epoch": 7.0,
1734
+ "eval_accuracy": 0.9938708156529938,
1735
+ "eval_loss": 0.026258554309606552,
1736
+ "eval_runtime": 102.4925,
1737
+ "eval_samples_per_second": 20.694,
1738
+ "eval_steps_per_second": 2.595,
1739
+ "step": 1393
1740
+ },
1741
+ {
1742
+ "epoch": 7.0,
1743
+ "step": 1393,
1744
+ "total_flos": 1.0256650917539217e+19,
1745
+ "train_loss": 0.15325962240578392,
1746
+ "train_runtime": 8636.3959,
1747
+ "train_samples_per_second": 5.157,
1748
+ "train_steps_per_second": 0.161
1749
+ }
1750
+ ],
1751
+ "max_steps": 1393,
1752
+ "num_train_epochs": 7,
1753
+ "total_flos": 1.0256650917539217e+19,
1754
+ "trial_name": null,
1755
+ "trial_params": null
1756
+ }