selmamalak commited on
Commit
b720887
1 Parent(s): a7cf66a

End of training

Browse files
Files changed (5) hide show
  1. README.md +4 -4
  2. all_results.json +16 -0
  3. eval_results.json +11 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1459 -0
README.md CHANGED
@@ -23,11 +23,11 @@ should probably proofread and complete it, then remove this comment. -->
23
 
24
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.0627
27
  - Accuracy: 0.9790
28
- - Precision: 0.9764
29
- - Recall: 0.9812
30
- - F1: 0.9786
31
 
32
  ## Model description
33
 
 
23
 
24
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
+ - Loss: 0.0692
27
  - Accuracy: 0.9790
28
+ - Precision: 0.9772
29
+ - Recall: 0.9785
30
+ - F1: 0.9778
31
 
32
  ## Model description
33
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9789535223618825,
4
+ "eval_f1": 0.9778395609614629,
5
+ "eval_loss": 0.06918257474899292,
6
+ "eval_precision": 0.9772213544264792,
7
+ "eval_recall": 0.9785446056392397,
8
+ "eval_runtime": 18.1272,
9
+ "eval_samples_per_second": 188.722,
10
+ "eval_steps_per_second": 11.805,
11
+ "total_flos": 9.332136680499118e+18,
12
+ "train_loss": 0.29726991015959553,
13
+ "train_runtime": 1395.7704,
14
+ "train_samples_per_second": 85.68,
15
+ "train_steps_per_second": 1.34
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9789535223618825,
4
+ "eval_f1": 0.9778395609614629,
5
+ "eval_loss": 0.06918257474899292,
6
+ "eval_precision": 0.9772213544264792,
7
+ "eval_recall": 0.9785446056392397,
8
+ "eval_runtime": 18.1272,
9
+ "eval_samples_per_second": 188.722,
10
+ "eval_steps_per_second": 11.805
11
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 9.332136680499118e+18,
4
+ "train_loss": 0.29726991015959553,
5
+ "train_runtime": 1395.7704,
6
+ "train_samples_per_second": 85.68,
7
+ "train_steps_per_second": 1.34
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9789719626168224,
3
+ "best_model_checkpoint": "vit-base-patch16-224-in21k-finetuned-lora-medmnistv2/checkpoint-1870",
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1870,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.05,
13
+ "grad_norm": 0.9260491132736206,
14
+ "learning_rate": 0.004973262032085562,
15
+ "loss": 1.5983,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.11,
20
+ "grad_norm": 1.1307735443115234,
21
+ "learning_rate": 0.004946524064171123,
22
+ "loss": 0.9417,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.16,
27
+ "grad_norm": 0.9537946581840515,
28
+ "learning_rate": 0.004919786096256685,
29
+ "loss": 0.7642,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.21,
34
+ "grad_norm": 0.8597701191902161,
35
+ "learning_rate": 0.004893048128342246,
36
+ "loss": 0.6992,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.27,
41
+ "grad_norm": 1.104675531387329,
42
+ "learning_rate": 0.004866310160427808,
43
+ "loss": 0.627,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.32,
48
+ "grad_norm": 0.846555233001709,
49
+ "learning_rate": 0.004839572192513369,
50
+ "loss": 0.5047,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.37,
55
+ "grad_norm": 1.423182487487793,
56
+ "learning_rate": 0.004812834224598931,
57
+ "loss": 0.5431,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.43,
62
+ "grad_norm": 0.8424627780914307,
63
+ "learning_rate": 0.004786096256684492,
64
+ "loss": 0.5962,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.48,
69
+ "grad_norm": 0.6608781814575195,
70
+ "learning_rate": 0.004759358288770054,
71
+ "loss": 0.4084,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.53,
76
+ "grad_norm": 1.130247712135315,
77
+ "learning_rate": 0.004732620320855615,
78
+ "loss": 0.4932,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.59,
83
+ "grad_norm": 0.6054658889770508,
84
+ "learning_rate": 0.004705882352941177,
85
+ "loss": 0.4684,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.64,
90
+ "grad_norm": 0.8725093603134155,
91
+ "learning_rate": 0.004679144385026738,
92
+ "loss": 0.4429,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.7,
97
+ "grad_norm": 0.6343618035316467,
98
+ "learning_rate": 0.0046524064171123,
99
+ "loss": 0.3952,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.75,
104
+ "grad_norm": 0.9175045490264893,
105
+ "learning_rate": 0.0046256684491978615,
106
+ "loss": 0.4592,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.8,
111
+ "grad_norm": 1.0295114517211914,
112
+ "learning_rate": 0.004598930481283423,
113
+ "loss": 0.4212,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.86,
118
+ "grad_norm": 0.4232007563114166,
119
+ "learning_rate": 0.004572192513368984,
120
+ "loss": 0.4165,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.91,
125
+ "grad_norm": 1.18360435962677,
126
+ "learning_rate": 0.00454812834224599,
127
+ "loss": 0.4245,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.96,
132
+ "grad_norm": 0.7265322804450989,
133
+ "learning_rate": 0.004521390374331551,
134
+ "loss": 0.4059,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 1.0,
139
+ "eval_accuracy": 0.9310747663551402,
140
+ "eval_f1": 0.9201346862223367,
141
+ "eval_loss": 0.18775394558906555,
142
+ "eval_precision": 0.913178148427007,
143
+ "eval_recall": 0.9327948208695145,
144
+ "eval_runtime": 9.5068,
145
+ "eval_samples_per_second": 180.082,
146
+ "eval_steps_per_second": 11.255,
147
+ "step": 187
148
+ },
149
+ {
150
+ "epoch": 1.02,
151
+ "grad_norm": 1.179917335510254,
152
+ "learning_rate": 0.004494652406417113,
153
+ "loss": 0.3646,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 1.07,
158
+ "grad_norm": 1.1189391613006592,
159
+ "learning_rate": 0.004467914438502674,
160
+ "loss": 0.4339,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 1.12,
165
+ "grad_norm": 0.8059839010238647,
166
+ "learning_rate": 0.004441176470588235,
167
+ "loss": 0.373,
168
+ "step": 210
169
+ },
170
+ {
171
+ "epoch": 1.18,
172
+ "grad_norm": 1.5934990644454956,
173
+ "learning_rate": 0.004414438502673797,
174
+ "loss": 0.4089,
175
+ "step": 220
176
+ },
177
+ {
178
+ "epoch": 1.23,
179
+ "grad_norm": 0.5738559365272522,
180
+ "learning_rate": 0.004387700534759359,
181
+ "loss": 0.4181,
182
+ "step": 230
183
+ },
184
+ {
185
+ "epoch": 1.28,
186
+ "grad_norm": 1.0053284168243408,
187
+ "learning_rate": 0.00436096256684492,
188
+ "loss": 0.354,
189
+ "step": 240
190
+ },
191
+ {
192
+ "epoch": 1.34,
193
+ "grad_norm": 0.6736829280853271,
194
+ "learning_rate": 0.004334224598930481,
195
+ "loss": 0.2862,
196
+ "step": 250
197
+ },
198
+ {
199
+ "epoch": 1.39,
200
+ "grad_norm": 0.7684084177017212,
201
+ "learning_rate": 0.0043074866310160425,
202
+ "loss": 0.3533,
203
+ "step": 260
204
+ },
205
+ {
206
+ "epoch": 1.44,
207
+ "grad_norm": 1.04612135887146,
208
+ "learning_rate": 0.004280748663101605,
209
+ "loss": 0.3654,
210
+ "step": 270
211
+ },
212
+ {
213
+ "epoch": 1.5,
214
+ "grad_norm": 0.7823394536972046,
215
+ "learning_rate": 0.004254010695187166,
216
+ "loss": 0.4385,
217
+ "step": 280
218
+ },
219
+ {
220
+ "epoch": 1.55,
221
+ "grad_norm": 0.9472429752349854,
222
+ "learning_rate": 0.004227272727272727,
223
+ "loss": 0.4417,
224
+ "step": 290
225
+ },
226
+ {
227
+ "epoch": 1.6,
228
+ "grad_norm": 0.889252245426178,
229
+ "learning_rate": 0.004200534759358289,
230
+ "loss": 0.3873,
231
+ "step": 300
232
+ },
233
+ {
234
+ "epoch": 1.66,
235
+ "grad_norm": 0.7252718806266785,
236
+ "learning_rate": 0.00417379679144385,
237
+ "loss": 0.3717,
238
+ "step": 310
239
+ },
240
+ {
241
+ "epoch": 1.71,
242
+ "grad_norm": 0.8687788844108582,
243
+ "learning_rate": 0.004147058823529412,
244
+ "loss": 0.3854,
245
+ "step": 320
246
+ },
247
+ {
248
+ "epoch": 1.76,
249
+ "grad_norm": 0.6197172999382019,
250
+ "learning_rate": 0.004122994652406417,
251
+ "loss": 0.3748,
252
+ "step": 330
253
+ },
254
+ {
255
+ "epoch": 1.82,
256
+ "grad_norm": 0.6506063342094421,
257
+ "learning_rate": 0.004096256684491978,
258
+ "loss": 0.2923,
259
+ "step": 340
260
+ },
261
+ {
262
+ "epoch": 1.87,
263
+ "grad_norm": 0.5267966389656067,
264
+ "learning_rate": 0.00406951871657754,
265
+ "loss": 0.4045,
266
+ "step": 350
267
+ },
268
+ {
269
+ "epoch": 1.93,
270
+ "grad_norm": 1.1251919269561768,
271
+ "learning_rate": 0.004042780748663102,
272
+ "loss": 0.3988,
273
+ "step": 360
274
+ },
275
+ {
276
+ "epoch": 1.98,
277
+ "grad_norm": 1.114890456199646,
278
+ "learning_rate": 0.004016042780748663,
279
+ "loss": 0.3796,
280
+ "step": 370
281
+ },
282
+ {
283
+ "epoch": 2.0,
284
+ "eval_accuracy": 0.9082943925233645,
285
+ "eval_f1": 0.886066241884805,
286
+ "eval_loss": 0.27294662594795227,
287
+ "eval_precision": 0.9131012141299326,
288
+ "eval_recall": 0.887497540228883,
289
+ "eval_runtime": 9.2331,
290
+ "eval_samples_per_second": 185.419,
291
+ "eval_steps_per_second": 11.589,
292
+ "step": 374
293
+ },
294
+ {
295
+ "epoch": 2.03,
296
+ "grad_norm": 0.7117612361907959,
297
+ "learning_rate": 0.003989304812834224,
298
+ "loss": 0.3724,
299
+ "step": 380
300
+ },
301
+ {
302
+ "epoch": 2.09,
303
+ "grad_norm": 0.9159232974052429,
304
+ "learning_rate": 0.00396524064171123,
305
+ "loss": 0.3155,
306
+ "step": 390
307
+ },
308
+ {
309
+ "epoch": 2.14,
310
+ "grad_norm": 0.6797966957092285,
311
+ "learning_rate": 0.003938502673796792,
312
+ "loss": 0.3531,
313
+ "step": 400
314
+ },
315
+ {
316
+ "epoch": 2.19,
317
+ "grad_norm": 0.912696361541748,
318
+ "learning_rate": 0.003911764705882353,
319
+ "loss": 0.2788,
320
+ "step": 410
321
+ },
322
+ {
323
+ "epoch": 2.25,
324
+ "grad_norm": 1.0336519479751587,
325
+ "learning_rate": 0.0038850267379679144,
326
+ "loss": 0.3692,
327
+ "step": 420
328
+ },
329
+ {
330
+ "epoch": 2.3,
331
+ "grad_norm": 0.8013398051261902,
332
+ "learning_rate": 0.003858288770053476,
333
+ "loss": 0.3561,
334
+ "step": 430
335
+ },
336
+ {
337
+ "epoch": 2.35,
338
+ "grad_norm": 0.6950948238372803,
339
+ "learning_rate": 0.003831550802139038,
340
+ "loss": 0.3295,
341
+ "step": 440
342
+ },
343
+ {
344
+ "epoch": 2.41,
345
+ "grad_norm": 0.7441625595092773,
346
+ "learning_rate": 0.003804812834224599,
347
+ "loss": 0.3285,
348
+ "step": 450
349
+ },
350
+ {
351
+ "epoch": 2.46,
352
+ "grad_norm": 4.745124816894531,
353
+ "learning_rate": 0.0037780748663101605,
354
+ "loss": 0.4162,
355
+ "step": 460
356
+ },
357
+ {
358
+ "epoch": 2.51,
359
+ "grad_norm": 1.3873414993286133,
360
+ "learning_rate": 0.003751336898395722,
361
+ "loss": 0.3424,
362
+ "step": 470
363
+ },
364
+ {
365
+ "epoch": 2.57,
366
+ "grad_norm": 0.7891167402267456,
367
+ "learning_rate": 0.0037272727272727275,
368
+ "loss": 0.3043,
369
+ "step": 480
370
+ },
371
+ {
372
+ "epoch": 2.62,
373
+ "grad_norm": 1.013873815536499,
374
+ "learning_rate": 0.003700534759358289,
375
+ "loss": 0.3754,
376
+ "step": 490
377
+ },
378
+ {
379
+ "epoch": 2.67,
380
+ "grad_norm": 0.9377150535583496,
381
+ "learning_rate": 0.00367379679144385,
382
+ "loss": 0.3675,
383
+ "step": 500
384
+ },
385
+ {
386
+ "epoch": 2.73,
387
+ "grad_norm": 2.7368648052215576,
388
+ "learning_rate": 0.0036470588235294117,
389
+ "loss": 0.2901,
390
+ "step": 510
391
+ },
392
+ {
393
+ "epoch": 2.78,
394
+ "grad_norm": 1.5487793684005737,
395
+ "learning_rate": 0.0036203208556149736,
396
+ "loss": 0.482,
397
+ "step": 520
398
+ },
399
+ {
400
+ "epoch": 2.83,
401
+ "grad_norm": 8.680522918701172,
402
+ "learning_rate": 0.003593582887700535,
403
+ "loss": 0.378,
404
+ "step": 530
405
+ },
406
+ {
407
+ "epoch": 2.89,
408
+ "grad_norm": 1.3777785301208496,
409
+ "learning_rate": 0.0035668449197860962,
410
+ "loss": 0.4919,
411
+ "step": 540
412
+ },
413
+ {
414
+ "epoch": 2.94,
415
+ "grad_norm": 2.1192550659179688,
416
+ "learning_rate": 0.0035401069518716578,
417
+ "loss": 0.3751,
418
+ "step": 550
419
+ },
420
+ {
421
+ "epoch": 2.99,
422
+ "grad_norm": 9.656478881835938,
423
+ "learning_rate": 0.0035133689839572193,
424
+ "loss": 0.424,
425
+ "step": 560
426
+ },
427
+ {
428
+ "epoch": 3.0,
429
+ "eval_accuracy": 0.866822429906542,
430
+ "eval_f1": 0.8491520459723211,
431
+ "eval_loss": 0.3701097071170807,
432
+ "eval_precision": 0.8797339861417046,
433
+ "eval_recall": 0.8520089249800192,
434
+ "eval_runtime": 9.219,
435
+ "eval_samples_per_second": 185.702,
436
+ "eval_steps_per_second": 11.606,
437
+ "step": 561
438
+ },
439
+ {
440
+ "epoch": 3.05,
441
+ "grad_norm": 1.5421924591064453,
442
+ "learning_rate": 0.0034866310160427804,
443
+ "loss": 0.4643,
444
+ "step": 570
445
+ },
446
+ {
447
+ "epoch": 3.1,
448
+ "grad_norm": 0.9370782375335693,
449
+ "learning_rate": 0.0034598930481283424,
450
+ "loss": 0.4274,
451
+ "step": 580
452
+ },
453
+ {
454
+ "epoch": 3.16,
455
+ "grad_norm": 1.6456141471862793,
456
+ "learning_rate": 0.003433155080213904,
457
+ "loss": 0.3616,
458
+ "step": 590
459
+ },
460
+ {
461
+ "epoch": 3.21,
462
+ "grad_norm": 1.2138258218765259,
463
+ "learning_rate": 0.0034064171122994654,
464
+ "loss": 0.4241,
465
+ "step": 600
466
+ },
467
+ {
468
+ "epoch": 3.26,
469
+ "grad_norm": 0.8959400057792664,
470
+ "learning_rate": 0.0033796791443850265,
471
+ "loss": 0.3392,
472
+ "step": 610
473
+ },
474
+ {
475
+ "epoch": 3.32,
476
+ "grad_norm": 0.8747026324272156,
477
+ "learning_rate": 0.003352941176470588,
478
+ "loss": 0.3533,
479
+ "step": 620
480
+ },
481
+ {
482
+ "epoch": 3.37,
483
+ "grad_norm": 1.7161656618118286,
484
+ "learning_rate": 0.00332620320855615,
485
+ "loss": 0.3407,
486
+ "step": 630
487
+ },
488
+ {
489
+ "epoch": 3.42,
490
+ "grad_norm": 0.9229569435119629,
491
+ "learning_rate": 0.0032994652406417115,
492
+ "loss": 0.3098,
493
+ "step": 640
494
+ },
495
+ {
496
+ "epoch": 3.48,
497
+ "grad_norm": 0.9468969702720642,
498
+ "learning_rate": 0.0032727272727272726,
499
+ "loss": 0.3896,
500
+ "step": 650
501
+ },
502
+ {
503
+ "epoch": 3.53,
504
+ "grad_norm": 1.4430208206176758,
505
+ "learning_rate": 0.003245989304812834,
506
+ "loss": 0.3395,
507
+ "step": 660
508
+ },
509
+ {
510
+ "epoch": 3.58,
511
+ "grad_norm": 1.20052969455719,
512
+ "learning_rate": 0.0032192513368983957,
513
+ "loss": 0.3448,
514
+ "step": 670
515
+ },
516
+ {
517
+ "epoch": 3.64,
518
+ "grad_norm": 1.1726669073104858,
519
+ "learning_rate": 0.0031925133689839577,
520
+ "loss": 0.342,
521
+ "step": 680
522
+ },
523
+ {
524
+ "epoch": 3.69,
525
+ "grad_norm": 0.7881722450256348,
526
+ "learning_rate": 0.0031657754010695188,
527
+ "loss": 0.301,
528
+ "step": 690
529
+ },
530
+ {
531
+ "epoch": 3.74,
532
+ "grad_norm": 0.7960072159767151,
533
+ "learning_rate": 0.0031390374331550803,
534
+ "loss": 0.2633,
535
+ "step": 700
536
+ },
537
+ {
538
+ "epoch": 3.8,
539
+ "grad_norm": 0.964872419834137,
540
+ "learning_rate": 0.003112299465240642,
541
+ "loss": 0.2691,
542
+ "step": 710
543
+ },
544
+ {
545
+ "epoch": 3.85,
546
+ "grad_norm": 0.9894037246704102,
547
+ "learning_rate": 0.003085561497326203,
548
+ "loss": 0.2859,
549
+ "step": 720
550
+ },
551
+ {
552
+ "epoch": 3.9,
553
+ "grad_norm": 1.0027267932891846,
554
+ "learning_rate": 0.003058823529411765,
555
+ "loss": 0.3027,
556
+ "step": 730
557
+ },
558
+ {
559
+ "epoch": 3.96,
560
+ "grad_norm": 1.0325654745101929,
561
+ "learning_rate": 0.0030320855614973264,
562
+ "loss": 0.3141,
563
+ "step": 740
564
+ },
565
+ {
566
+ "epoch": 4.0,
567
+ "eval_accuracy": 0.9380841121495327,
568
+ "eval_f1": 0.9283105641226367,
569
+ "eval_loss": 0.18485769629478455,
570
+ "eval_precision": 0.9266830676466586,
571
+ "eval_recall": 0.9336478146798447,
572
+ "eval_runtime": 9.3787,
573
+ "eval_samples_per_second": 182.542,
574
+ "eval_steps_per_second": 11.409,
575
+ "step": 748
576
+ },
577
+ {
578
+ "epoch": 4.01,
579
+ "grad_norm": 1.263634443283081,
580
+ "learning_rate": 0.003005347593582888,
581
+ "loss": 0.3592,
582
+ "step": 750
583
+ },
584
+ {
585
+ "epoch": 4.06,
586
+ "grad_norm": 1.8158007860183716,
587
+ "learning_rate": 0.002978609625668449,
588
+ "loss": 0.364,
589
+ "step": 760
590
+ },
591
+ {
592
+ "epoch": 4.12,
593
+ "grad_norm": 0.9459696412086487,
594
+ "learning_rate": 0.0029518716577540106,
595
+ "loss": 0.3587,
596
+ "step": 770
597
+ },
598
+ {
599
+ "epoch": 4.17,
600
+ "grad_norm": 0.7624779343605042,
601
+ "learning_rate": 0.0029251336898395725,
602
+ "loss": 0.304,
603
+ "step": 780
604
+ },
605
+ {
606
+ "epoch": 4.22,
607
+ "grad_norm": 0.8625235557556152,
608
+ "learning_rate": 0.002898395721925134,
609
+ "loss": 0.2726,
610
+ "step": 790
611
+ },
612
+ {
613
+ "epoch": 4.28,
614
+ "grad_norm": 0.962257444858551,
615
+ "learning_rate": 0.002871657754010695,
616
+ "loss": 0.2601,
617
+ "step": 800
618
+ },
619
+ {
620
+ "epoch": 4.33,
621
+ "grad_norm": 0.6333624720573425,
622
+ "learning_rate": 0.0028449197860962567,
623
+ "loss": 0.3448,
624
+ "step": 810
625
+ },
626
+ {
627
+ "epoch": 4.39,
628
+ "grad_norm": 1.3983910083770752,
629
+ "learning_rate": 0.002818181818181818,
630
+ "loss": 0.3202,
631
+ "step": 820
632
+ },
633
+ {
634
+ "epoch": 4.44,
635
+ "grad_norm": 0.6626348495483398,
636
+ "learning_rate": 0.00279144385026738,
637
+ "loss": 0.2529,
638
+ "step": 830
639
+ },
640
+ {
641
+ "epoch": 4.49,
642
+ "grad_norm": 0.8221544027328491,
643
+ "learning_rate": 0.0027647058823529413,
644
+ "loss": 0.2523,
645
+ "step": 840
646
+ },
647
+ {
648
+ "epoch": 4.55,
649
+ "grad_norm": 0.7872591018676758,
650
+ "learning_rate": 0.002737967914438503,
651
+ "loss": 0.2832,
652
+ "step": 850
653
+ },
654
+ {
655
+ "epoch": 4.6,
656
+ "grad_norm": 1.50129234790802,
657
+ "learning_rate": 0.0027112299465240643,
658
+ "loss": 0.2912,
659
+ "step": 860
660
+ },
661
+ {
662
+ "epoch": 4.65,
663
+ "grad_norm": 0.7471727728843689,
664
+ "learning_rate": 0.0026844919786096254,
665
+ "loss": 0.3097,
666
+ "step": 870
667
+ },
668
+ {
669
+ "epoch": 4.71,
670
+ "grad_norm": 0.6078329086303711,
671
+ "learning_rate": 0.002657754010695187,
672
+ "loss": 0.2657,
673
+ "step": 880
674
+ },
675
+ {
676
+ "epoch": 4.76,
677
+ "grad_norm": 0.8674110174179077,
678
+ "learning_rate": 0.002631016042780749,
679
+ "loss": 0.2633,
680
+ "step": 890
681
+ },
682
+ {
683
+ "epoch": 4.81,
684
+ "grad_norm": 0.5421575307846069,
685
+ "learning_rate": 0.0026042780748663104,
686
+ "loss": 0.257,
687
+ "step": 900
688
+ },
689
+ {
690
+ "epoch": 4.87,
691
+ "grad_norm": 1.314867377281189,
692
+ "learning_rate": 0.0025775401069518715,
693
+ "loss": 0.2688,
694
+ "step": 910
695
+ },
696
+ {
697
+ "epoch": 4.92,
698
+ "grad_norm": 0.698221743106842,
699
+ "learning_rate": 0.002550802139037433,
700
+ "loss": 0.2506,
701
+ "step": 920
702
+ },
703
+ {
704
+ "epoch": 4.97,
705
+ "grad_norm": 0.5437451004981995,
706
+ "learning_rate": 0.0025240641711229946,
707
+ "loss": 0.2553,
708
+ "step": 930
709
+ },
710
+ {
711
+ "epoch": 5.0,
712
+ "eval_accuracy": 0.9643691588785047,
713
+ "eval_f1": 0.9617344813251135,
714
+ "eval_loss": 0.1074606254696846,
715
+ "eval_precision": 0.9630090863077152,
716
+ "eval_recall": 0.9611619604560873,
717
+ "eval_runtime": 9.213,
718
+ "eval_samples_per_second": 185.824,
719
+ "eval_steps_per_second": 11.614,
720
+ "step": 935
721
+ },
722
+ {
723
+ "epoch": 5.03,
724
+ "grad_norm": 0.9639925956726074,
725
+ "learning_rate": 0.002497326203208556,
726
+ "loss": 0.2186,
727
+ "step": 940
728
+ },
729
+ {
730
+ "epoch": 5.08,
731
+ "grad_norm": 1.0346194505691528,
732
+ "learning_rate": 0.0024705882352941176,
733
+ "loss": 0.3163,
734
+ "step": 950
735
+ },
736
+ {
737
+ "epoch": 5.13,
738
+ "grad_norm": 0.9101438522338867,
739
+ "learning_rate": 0.002443850267379679,
740
+ "loss": 0.257,
741
+ "step": 960
742
+ },
743
+ {
744
+ "epoch": 5.19,
745
+ "grad_norm": 0.9387779831886292,
746
+ "learning_rate": 0.0024171122994652407,
747
+ "loss": 0.2745,
748
+ "step": 970
749
+ },
750
+ {
751
+ "epoch": 5.24,
752
+ "grad_norm": 1.3407084941864014,
753
+ "learning_rate": 0.0023903743315508022,
754
+ "loss": 0.2775,
755
+ "step": 980
756
+ },
757
+ {
758
+ "epoch": 5.29,
759
+ "grad_norm": 0.7988283038139343,
760
+ "learning_rate": 0.0023636363636363638,
761
+ "loss": 0.2568,
762
+ "step": 990
763
+ },
764
+ {
765
+ "epoch": 5.35,
766
+ "grad_norm": 0.8980028033256531,
767
+ "learning_rate": 0.0023368983957219253,
768
+ "loss": 0.296,
769
+ "step": 1000
770
+ },
771
+ {
772
+ "epoch": 5.4,
773
+ "grad_norm": 0.8847124576568604,
774
+ "learning_rate": 0.002310160427807487,
775
+ "loss": 0.2525,
776
+ "step": 1010
777
+ },
778
+ {
779
+ "epoch": 5.45,
780
+ "grad_norm": 1.3140696287155151,
781
+ "learning_rate": 0.002283422459893048,
782
+ "loss": 0.2967,
783
+ "step": 1020
784
+ },
785
+ {
786
+ "epoch": 5.51,
787
+ "grad_norm": 0.6774911284446716,
788
+ "learning_rate": 0.00225668449197861,
789
+ "loss": 0.2735,
790
+ "step": 1030
791
+ },
792
+ {
793
+ "epoch": 5.56,
794
+ "grad_norm": 0.9686025977134705,
795
+ "learning_rate": 0.002229946524064171,
796
+ "loss": 0.2415,
797
+ "step": 1040
798
+ },
799
+ {
800
+ "epoch": 5.61,
801
+ "grad_norm": 1.3379433155059814,
802
+ "learning_rate": 0.0022032085561497325,
803
+ "loss": 0.2656,
804
+ "step": 1050
805
+ },
806
+ {
807
+ "epoch": 5.67,
808
+ "grad_norm": 0.6908765435218811,
809
+ "learning_rate": 0.002176470588235294,
810
+ "loss": 0.2532,
811
+ "step": 1060
812
+ },
813
+ {
814
+ "epoch": 5.72,
815
+ "grad_norm": 0.8308853507041931,
816
+ "learning_rate": 0.0021497326203208556,
817
+ "loss": 0.2428,
818
+ "step": 1070
819
+ },
820
+ {
821
+ "epoch": 5.78,
822
+ "grad_norm": 1.2064207792282104,
823
+ "learning_rate": 0.002122994652406417,
824
+ "loss": 0.2989,
825
+ "step": 1080
826
+ },
827
+ {
828
+ "epoch": 5.83,
829
+ "grad_norm": 0.8376064896583557,
830
+ "learning_rate": 0.0020962566844919786,
831
+ "loss": 0.2061,
832
+ "step": 1090
833
+ },
834
+ {
835
+ "epoch": 5.88,
836
+ "grad_norm": 0.9363247156143188,
837
+ "learning_rate": 0.00206951871657754,
838
+ "loss": 0.2447,
839
+ "step": 1100
840
+ },
841
+ {
842
+ "epoch": 5.94,
843
+ "grad_norm": 7.874444007873535,
844
+ "learning_rate": 0.0020427807486631017,
845
+ "loss": 0.2254,
846
+ "step": 1110
847
+ },
848
+ {
849
+ "epoch": 5.99,
850
+ "grad_norm": 0.9535788297653198,
851
+ "learning_rate": 0.002016042780748663,
852
+ "loss": 0.2686,
853
+ "step": 1120
854
+ },
855
+ {
856
+ "epoch": 6.0,
857
+ "eval_accuracy": 0.9485981308411215,
858
+ "eval_f1": 0.9488981890553403,
859
+ "eval_loss": 0.16793404519557953,
860
+ "eval_precision": 0.9560571498851578,
861
+ "eval_recall": 0.9437216744429628,
862
+ "eval_runtime": 9.2543,
863
+ "eval_samples_per_second": 184.995,
864
+ "eval_steps_per_second": 11.562,
865
+ "step": 1122
866
+ },
867
+ {
868
+ "epoch": 6.04,
869
+ "grad_norm": 0.9278040528297424,
870
+ "learning_rate": 0.0019893048128342247,
871
+ "loss": 0.256,
872
+ "step": 1130
873
+ },
874
+ {
875
+ "epoch": 6.1,
876
+ "grad_norm": 1.0177885293960571,
877
+ "learning_rate": 0.0019625668449197863,
878
+ "loss": 0.2173,
879
+ "step": 1140
880
+ },
881
+ {
882
+ "epoch": 6.15,
883
+ "grad_norm": 0.5898217558860779,
884
+ "learning_rate": 0.0019358288770053476,
885
+ "loss": 0.2257,
886
+ "step": 1150
887
+ },
888
+ {
889
+ "epoch": 6.2,
890
+ "grad_norm": 5.235673904418945,
891
+ "learning_rate": 0.0019090909090909091,
892
+ "loss": 0.2388,
893
+ "step": 1160
894
+ },
895
+ {
896
+ "epoch": 6.26,
897
+ "grad_norm": 1.1271004676818848,
898
+ "learning_rate": 0.0018823529411764706,
899
+ "loss": 0.2544,
900
+ "step": 1170
901
+ },
902
+ {
903
+ "epoch": 6.31,
904
+ "grad_norm": 0.6136900186538696,
905
+ "learning_rate": 0.001855614973262032,
906
+ "loss": 0.2785,
907
+ "step": 1180
908
+ },
909
+ {
910
+ "epoch": 6.36,
911
+ "grad_norm": 0.9343350529670715,
912
+ "learning_rate": 0.0018288770053475937,
913
+ "loss": 0.2304,
914
+ "step": 1190
915
+ },
916
+ {
917
+ "epoch": 6.42,
918
+ "grad_norm": 0.7129714488983154,
919
+ "learning_rate": 0.001802139037433155,
920
+ "loss": 0.1709,
921
+ "step": 1200
922
+ },
923
+ {
924
+ "epoch": 6.47,
925
+ "grad_norm": 0.8645954132080078,
926
+ "learning_rate": 0.0017754010695187168,
927
+ "loss": 0.2099,
928
+ "step": 1210
929
+ },
930
+ {
931
+ "epoch": 6.52,
932
+ "grad_norm": 0.4692780375480652,
933
+ "learning_rate": 0.001748663101604278,
934
+ "loss": 0.1801,
935
+ "step": 1220
936
+ },
937
+ {
938
+ "epoch": 6.58,
939
+ "grad_norm": 1.1131465435028076,
940
+ "learning_rate": 0.0017219251336898396,
941
+ "loss": 0.2187,
942
+ "step": 1230
943
+ },
944
+ {
945
+ "epoch": 6.63,
946
+ "grad_norm": 1.0496641397476196,
947
+ "learning_rate": 0.0016951871657754011,
948
+ "loss": 0.2381,
949
+ "step": 1240
950
+ },
951
+ {
952
+ "epoch": 6.68,
953
+ "grad_norm": 0.7512268424034119,
954
+ "learning_rate": 0.0016684491978609627,
955
+ "loss": 0.2171,
956
+ "step": 1250
957
+ },
958
+ {
959
+ "epoch": 6.74,
960
+ "grad_norm": 0.9206662774085999,
961
+ "learning_rate": 0.0016417112299465242,
962
+ "loss": 0.1716,
963
+ "step": 1260
964
+ },
965
+ {
966
+ "epoch": 6.79,
967
+ "grad_norm": 1.044285535812378,
968
+ "learning_rate": 0.0016149732620320857,
969
+ "loss": 0.1996,
970
+ "step": 1270
971
+ },
972
+ {
973
+ "epoch": 6.84,
974
+ "grad_norm": 1.5523549318313599,
975
+ "learning_rate": 0.001588235294117647,
976
+ "loss": 0.198,
977
+ "step": 1280
978
+ },
979
+ {
980
+ "epoch": 6.9,
981
+ "grad_norm": 0.7654513120651245,
982
+ "learning_rate": 0.0015614973262032088,
983
+ "loss": 0.2341,
984
+ "step": 1290
985
+ },
986
+ {
987
+ "epoch": 6.95,
988
+ "grad_norm": 1.145663857460022,
989
+ "learning_rate": 0.00153475935828877,
990
+ "loss": 0.2556,
991
+ "step": 1300
992
+ },
993
+ {
994
+ "epoch": 7.0,
995
+ "eval_accuracy": 0.9661214953271028,
996
+ "eval_f1": 0.9619479557860847,
997
+ "eval_loss": 0.09340371936559677,
998
+ "eval_precision": 0.9651383824240083,
999
+ "eval_recall": 0.9598949442531882,
1000
+ "eval_runtime": 9.0216,
1001
+ "eval_samples_per_second": 189.767,
1002
+ "eval_steps_per_second": 11.86,
1003
+ "step": 1309
1004
+ },
1005
+ {
1006
+ "epoch": 7.01,
1007
+ "grad_norm": 0.8554219603538513,
1008
+ "learning_rate": 0.0015080213903743314,
1009
+ "loss": 0.237,
1010
+ "step": 1310
1011
+ },
1012
+ {
1013
+ "epoch": 7.06,
1014
+ "grad_norm": 0.7055748701095581,
1015
+ "learning_rate": 0.0014812834224598931,
1016
+ "loss": 0.2317,
1017
+ "step": 1320
1018
+ },
1019
+ {
1020
+ "epoch": 7.11,
1021
+ "grad_norm": 1.0891897678375244,
1022
+ "learning_rate": 0.0014545454545454545,
1023
+ "loss": 0.1723,
1024
+ "step": 1330
1025
+ },
1026
+ {
1027
+ "epoch": 7.17,
1028
+ "grad_norm": 0.5554465651512146,
1029
+ "learning_rate": 0.0014278074866310162,
1030
+ "loss": 0.1986,
1031
+ "step": 1340
1032
+ },
1033
+ {
1034
+ "epoch": 7.22,
1035
+ "grad_norm": 1.0232211351394653,
1036
+ "learning_rate": 0.0014010695187165775,
1037
+ "loss": 0.2222,
1038
+ "step": 1350
1039
+ },
1040
+ {
1041
+ "epoch": 7.27,
1042
+ "grad_norm": 0.6204003095626831,
1043
+ "learning_rate": 0.001374331550802139,
1044
+ "loss": 0.1827,
1045
+ "step": 1360
1046
+ },
1047
+ {
1048
+ "epoch": 7.33,
1049
+ "grad_norm": 0.7353977560997009,
1050
+ "learning_rate": 0.0013475935828877006,
1051
+ "loss": 0.1649,
1052
+ "step": 1370
1053
+ },
1054
+ {
1055
+ "epoch": 7.38,
1056
+ "grad_norm": 0.734186053276062,
1057
+ "learning_rate": 0.001320855614973262,
1058
+ "loss": 0.194,
1059
+ "step": 1380
1060
+ },
1061
+ {
1062
+ "epoch": 7.43,
1063
+ "grad_norm": 0.47959616780281067,
1064
+ "learning_rate": 0.0012941176470588236,
1065
+ "loss": 0.1763,
1066
+ "step": 1390
1067
+ },
1068
+ {
1069
+ "epoch": 7.49,
1070
+ "grad_norm": 0.6939826607704163,
1071
+ "learning_rate": 0.0012673796791443852,
1072
+ "loss": 0.2286,
1073
+ "step": 1400
1074
+ },
1075
+ {
1076
+ "epoch": 7.54,
1077
+ "grad_norm": 0.948558509349823,
1078
+ "learning_rate": 0.0012406417112299467,
1079
+ "loss": 0.2506,
1080
+ "step": 1410
1081
+ },
1082
+ {
1083
+ "epoch": 7.59,
1084
+ "grad_norm": 0.8466843962669373,
1085
+ "learning_rate": 0.001213903743315508,
1086
+ "loss": 0.2175,
1087
+ "step": 1420
1088
+ },
1089
+ {
1090
+ "epoch": 7.65,
1091
+ "grad_norm": 0.6146303415298462,
1092
+ "learning_rate": 0.0011871657754010695,
1093
+ "loss": 0.1641,
1094
+ "step": 1430
1095
+ },
1096
+ {
1097
+ "epoch": 7.7,
1098
+ "grad_norm": 0.8321207761764526,
1099
+ "learning_rate": 0.001160427807486631,
1100
+ "loss": 0.1903,
1101
+ "step": 1440
1102
+ },
1103
+ {
1104
+ "epoch": 7.75,
1105
+ "grad_norm": 0.7309682965278625,
1106
+ "learning_rate": 0.0011336898395721926,
1107
+ "loss": 0.1981,
1108
+ "step": 1450
1109
+ },
1110
+ {
1111
+ "epoch": 7.81,
1112
+ "grad_norm": 0.5901007652282715,
1113
+ "learning_rate": 0.0011069518716577541,
1114
+ "loss": 0.2011,
1115
+ "step": 1460
1116
+ },
1117
+ {
1118
+ "epoch": 7.86,
1119
+ "grad_norm": 0.9141890406608582,
1120
+ "learning_rate": 0.0010802139037433154,
1121
+ "loss": 0.2735,
1122
+ "step": 1470
1123
+ },
1124
+ {
1125
+ "epoch": 7.91,
1126
+ "grad_norm": 0.813578724861145,
1127
+ "learning_rate": 0.001053475935828877,
1128
+ "loss": 0.2093,
1129
+ "step": 1480
1130
+ },
1131
+ {
1132
+ "epoch": 7.97,
1133
+ "grad_norm": 0.4584049582481384,
1134
+ "learning_rate": 0.0010267379679144385,
1135
+ "loss": 0.1777,
1136
+ "step": 1490
1137
+ },
1138
+ {
1139
+ "epoch": 8.0,
1140
+ "eval_accuracy": 0.969626168224299,
1141
+ "eval_f1": 0.9686486797969157,
1142
+ "eval_loss": 0.08350867033004761,
1143
+ "eval_precision": 0.9696703038283683,
1144
+ "eval_recall": 0.9682591946397131,
1145
+ "eval_runtime": 9.2254,
1146
+ "eval_samples_per_second": 185.574,
1147
+ "eval_steps_per_second": 11.598,
1148
+ "step": 1496
1149
+ },
1150
+ {
1151
+ "epoch": 8.02,
1152
+ "grad_norm": 0.7080217599868774,
1153
+ "learning_rate": 0.001,
1154
+ "loss": 0.1999,
1155
+ "step": 1500
1156
+ },
1157
+ {
1158
+ "epoch": 8.07,
1159
+ "grad_norm": 0.9281997084617615,
1160
+ "learning_rate": 0.0009732620320855614,
1161
+ "loss": 0.1688,
1162
+ "step": 1510
1163
+ },
1164
+ {
1165
+ "epoch": 8.13,
1166
+ "grad_norm": 0.8174493312835693,
1167
+ "learning_rate": 0.000946524064171123,
1168
+ "loss": 0.1731,
1169
+ "step": 1520
1170
+ },
1171
+ {
1172
+ "epoch": 8.18,
1173
+ "grad_norm": 0.6349031925201416,
1174
+ "learning_rate": 0.0009197860962566845,
1175
+ "loss": 0.1672,
1176
+ "step": 1530
1177
+ },
1178
+ {
1179
+ "epoch": 8.24,
1180
+ "grad_norm": 0.8174115419387817,
1181
+ "learning_rate": 0.000893048128342246,
1182
+ "loss": 0.1839,
1183
+ "step": 1540
1184
+ },
1185
+ {
1186
+ "epoch": 8.29,
1187
+ "grad_norm": 0.6900407671928406,
1188
+ "learning_rate": 0.0008663101604278075,
1189
+ "loss": 0.2044,
1190
+ "step": 1550
1191
+ },
1192
+ {
1193
+ "epoch": 8.34,
1194
+ "grad_norm": 0.2948859930038452,
1195
+ "learning_rate": 0.000839572192513369,
1196
+ "loss": 0.1328,
1197
+ "step": 1560
1198
+ },
1199
+ {
1200
+ "epoch": 8.4,
1201
+ "grad_norm": 0.7020041942596436,
1202
+ "learning_rate": 0.0008128342245989305,
1203
+ "loss": 0.1759,
1204
+ "step": 1570
1205
+ },
1206
+ {
1207
+ "epoch": 8.45,
1208
+ "grad_norm": 1.0418401956558228,
1209
+ "learning_rate": 0.000786096256684492,
1210
+ "loss": 0.1777,
1211
+ "step": 1580
1212
+ },
1213
+ {
1214
+ "epoch": 8.5,
1215
+ "grad_norm": 0.7473070025444031,
1216
+ "learning_rate": 0.0007593582887700536,
1217
+ "loss": 0.1631,
1218
+ "step": 1590
1219
+ },
1220
+ {
1221
+ "epoch": 8.56,
1222
+ "grad_norm": 0.8006024360656738,
1223
+ "learning_rate": 0.000732620320855615,
1224
+ "loss": 0.1566,
1225
+ "step": 1600
1226
+ },
1227
+ {
1228
+ "epoch": 8.61,
1229
+ "grad_norm": 1.0594407320022583,
1230
+ "learning_rate": 0.0007058823529411765,
1231
+ "loss": 0.184,
1232
+ "step": 1610
1233
+ },
1234
+ {
1235
+ "epoch": 8.66,
1236
+ "grad_norm": 0.6014285087585449,
1237
+ "learning_rate": 0.000679144385026738,
1238
+ "loss": 0.1583,
1239
+ "step": 1620
1240
+ },
1241
+ {
1242
+ "epoch": 8.72,
1243
+ "grad_norm": 0.6736869812011719,
1244
+ "learning_rate": 0.0006524064171122996,
1245
+ "loss": 0.1468,
1246
+ "step": 1630
1247
+ },
1248
+ {
1249
+ "epoch": 8.77,
1250
+ "grad_norm": 0.6957813501358032,
1251
+ "learning_rate": 0.0006256684491978609,
1252
+ "loss": 0.1731,
1253
+ "step": 1640
1254
+ },
1255
+ {
1256
+ "epoch": 8.82,
1257
+ "grad_norm": 0.5073075294494629,
1258
+ "learning_rate": 0.0005989304812834224,
1259
+ "loss": 0.176,
1260
+ "step": 1650
1261
+ },
1262
+ {
1263
+ "epoch": 8.88,
1264
+ "grad_norm": 0.5485414862632751,
1265
+ "learning_rate": 0.000572192513368984,
1266
+ "loss": 0.1936,
1267
+ "step": 1660
1268
+ },
1269
+ {
1270
+ "epoch": 8.93,
1271
+ "grad_norm": 0.8590062856674194,
1272
+ "learning_rate": 0.0005454545454545455,
1273
+ "loss": 0.1795,
1274
+ "step": 1670
1275
+ },
1276
+ {
1277
+ "epoch": 8.98,
1278
+ "grad_norm": 0.49274083971977234,
1279
+ "learning_rate": 0.000518716577540107,
1280
+ "loss": 0.1607,
1281
+ "step": 1680
1282
+ },
1283
+ {
1284
+ "epoch": 9.0,
1285
+ "eval_accuracy": 0.9772196261682243,
1286
+ "eval_f1": 0.9758896890562156,
1287
+ "eval_loss": 0.07392112910747528,
1288
+ "eval_precision": 0.9732910812266744,
1289
+ "eval_recall": 0.97920005624388,
1290
+ "eval_runtime": 9.2433,
1291
+ "eval_samples_per_second": 185.214,
1292
+ "eval_steps_per_second": 11.576,
1293
+ "step": 1683
1294
+ },
1295
+ {
1296
+ "epoch": 9.04,
1297
+ "grad_norm": 0.4997323751449585,
1298
+ "learning_rate": 0.0004919786096256684,
1299
+ "loss": 0.1352,
1300
+ "step": 1690
1301
+ },
1302
+ {
1303
+ "epoch": 9.09,
1304
+ "grad_norm": 0.5221167206764221,
1305
+ "learning_rate": 0.00046524064171122996,
1306
+ "loss": 0.1597,
1307
+ "step": 1700
1308
+ },
1309
+ {
1310
+ "epoch": 9.14,
1311
+ "grad_norm": 0.6731162071228027,
1312
+ "learning_rate": 0.0004385026737967915,
1313
+ "loss": 0.1639,
1314
+ "step": 1710
1315
+ },
1316
+ {
1317
+ "epoch": 9.2,
1318
+ "grad_norm": 0.5156794786453247,
1319
+ "learning_rate": 0.00041176470588235296,
1320
+ "loss": 0.1667,
1321
+ "step": 1720
1322
+ },
1323
+ {
1324
+ "epoch": 9.25,
1325
+ "grad_norm": 0.767203152179718,
1326
+ "learning_rate": 0.0003850267379679145,
1327
+ "loss": 0.1672,
1328
+ "step": 1730
1329
+ },
1330
+ {
1331
+ "epoch": 9.3,
1332
+ "grad_norm": 0.5664710402488708,
1333
+ "learning_rate": 0.0003582887700534759,
1334
+ "loss": 0.1428,
1335
+ "step": 1740
1336
+ },
1337
+ {
1338
+ "epoch": 9.36,
1339
+ "grad_norm": 0.37641459703445435,
1340
+ "learning_rate": 0.00033155080213903744,
1341
+ "loss": 0.1667,
1342
+ "step": 1750
1343
+ },
1344
+ {
1345
+ "epoch": 9.41,
1346
+ "grad_norm": 0.5527117252349854,
1347
+ "learning_rate": 0.0003048128342245989,
1348
+ "loss": 0.1723,
1349
+ "step": 1760
1350
+ },
1351
+ {
1352
+ "epoch": 9.47,
1353
+ "grad_norm": 0.8746387958526611,
1354
+ "learning_rate": 0.00027807486631016044,
1355
+ "loss": 0.1596,
1356
+ "step": 1770
1357
+ },
1358
+ {
1359
+ "epoch": 9.52,
1360
+ "grad_norm": 0.5461722612380981,
1361
+ "learning_rate": 0.0002513368983957219,
1362
+ "loss": 0.17,
1363
+ "step": 1780
1364
+ },
1365
+ {
1366
+ "epoch": 9.57,
1367
+ "grad_norm": 0.5201784372329712,
1368
+ "learning_rate": 0.00022459893048128345,
1369
+ "loss": 0.1268,
1370
+ "step": 1790
1371
+ },
1372
+ {
1373
+ "epoch": 9.63,
1374
+ "grad_norm": 0.44921737909317017,
1375
+ "learning_rate": 0.00019786096256684492,
1376
+ "loss": 0.1537,
1377
+ "step": 1800
1378
+ },
1379
+ {
1380
+ "epoch": 9.68,
1381
+ "grad_norm": 0.6538177728652954,
1382
+ "learning_rate": 0.00017112299465240642,
1383
+ "loss": 0.1564,
1384
+ "step": 1810
1385
+ },
1386
+ {
1387
+ "epoch": 9.73,
1388
+ "grad_norm": 0.39654332399368286,
1389
+ "learning_rate": 0.00014438502673796793,
1390
+ "loss": 0.1196,
1391
+ "step": 1820
1392
+ },
1393
+ {
1394
+ "epoch": 9.79,
1395
+ "grad_norm": 0.5751528143882751,
1396
+ "learning_rate": 0.00011764705882352942,
1397
+ "loss": 0.1953,
1398
+ "step": 1830
1399
+ },
1400
+ {
1401
+ "epoch": 9.84,
1402
+ "grad_norm": 0.7018762826919556,
1403
+ "learning_rate": 9.09090909090909e-05,
1404
+ "loss": 0.1414,
1405
+ "step": 1840
1406
+ },
1407
+ {
1408
+ "epoch": 9.89,
1409
+ "grad_norm": 0.8955555558204651,
1410
+ "learning_rate": 6.41711229946524e-05,
1411
+ "loss": 0.1415,
1412
+ "step": 1850
1413
+ },
1414
+ {
1415
+ "epoch": 9.95,
1416
+ "grad_norm": 0.29650095105171204,
1417
+ "learning_rate": 3.74331550802139e-05,
1418
+ "loss": 0.1361,
1419
+ "step": 1860
1420
+ },
1421
+ {
1422
+ "epoch": 10.0,
1423
+ "grad_norm": 0.6939311623573303,
1424
+ "learning_rate": 1.0695187165775402e-05,
1425
+ "loss": 0.1898,
1426
+ "step": 1870
1427
+ },
1428
+ {
1429
+ "epoch": 10.0,
1430
+ "eval_accuracy": 0.9789719626168224,
1431
+ "eval_f1": 0.9786328578443323,
1432
+ "eval_loss": 0.06271301954984665,
1433
+ "eval_precision": 0.9764445771965571,
1434
+ "eval_recall": 0.9811556249771411,
1435
+ "eval_runtime": 8.9863,
1436
+ "eval_samples_per_second": 190.512,
1437
+ "eval_steps_per_second": 11.907,
1438
+ "step": 1870
1439
+ },
1440
+ {
1441
+ "epoch": 10.0,
1442
+ "step": 1870,
1443
+ "total_flos": 9.332136680499118e+18,
1444
+ "train_loss": 0.29726991015959553,
1445
+ "train_runtime": 1395.7704,
1446
+ "train_samples_per_second": 85.68,
1447
+ "train_steps_per_second": 1.34
1448
+ }
1449
+ ],
1450
+ "logging_steps": 10,
1451
+ "max_steps": 1870,
1452
+ "num_input_tokens_seen": 0,
1453
+ "num_train_epochs": 10,
1454
+ "save_steps": 500,
1455
+ "total_flos": 9.332136680499118e+18,
1456
+ "train_batch_size": 16,
1457
+ "trial_name": null,
1458
+ "trial_params": null
1459
+ }