selmamalak commited on
Commit
bc4804e
·
verified ·
1 Parent(s): c89d314

End of training

Browse files
Files changed (5) hide show
  1. README.md +5 -5
  2. all_results.json +16 -0
  3. eval_results.json +11 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1676 -0
README.md CHANGED
@@ -23,11 +23,11 @@ should probably proofread and complete it, then remove this comment. -->
23
 
24
  This model is a fine-tuned version of [facebook/deit-base-patch16-224](https://huggingface.co/facebook/deit-base-patch16-224) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.2558
27
- - Accuracy: 0.8997
28
- - Precision: 0.8463
29
- - Recall: 0.8395
30
- - F1: 0.8416
31
 
32
  ## Model description
33
 
 
23
 
24
  This model is a fine-tuned version of [facebook/deit-base-patch16-224](https://huggingface.co/facebook/deit-base-patch16-224) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
+ - Loss: 0.4815
27
+ - Accuracy: 0.8080
28
+ - Precision: 0.7703
29
+ - Recall: 0.7686
30
+ - F1: 0.7650
31
 
32
  ## Model description
33
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8080190282025145,
4
+ "eval_f1": 0.7649840068771034,
5
+ "eval_loss": 0.4814508855342865,
6
+ "eval_precision": 0.7703206524276631,
7
+ "eval_recall": 0.7685966768079108,
8
+ "eval_runtime": 72.3887,
9
+ "eval_samples_per_second": 121.967,
10
+ "eval_steps_per_second": 7.625,
11
+ "total_flos": 1.0878579515820442e+19,
12
+ "train_loss": 0.69453261346992,
13
+ "train_runtime": 2383.9912,
14
+ "train_samples_per_second": 58.473,
15
+ "train_steps_per_second": 0.914
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8080190282025145,
4
+ "eval_f1": 0.7649840068771034,
5
+ "eval_loss": 0.4814508855342865,
6
+ "eval_precision": 0.7703206524276631,
7
+ "eval_recall": 0.7685966768079108,
8
+ "eval_runtime": 72.3887,
9
+ "eval_samples_per_second": 121.967,
10
+ "eval_steps_per_second": 7.625
11
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 1.0878579515820442e+19,
4
+ "train_loss": 0.69453261346992,
5
+ "train_runtime": 2383.9912,
6
+ "train_samples_per_second": 58.473,
7
+ "train_steps_per_second": 0.914
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1676 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9061990212071778,
3
+ "best_model_checkpoint": "deit-base-patch16-224-finetuned-lora-medmnistv2/checkpoint-1526",
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 2180,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.05,
13
+ "grad_norm": 3.7138895988464355,
14
+ "learning_rate": 0.004977064220183487,
15
+ "loss": 1.9548,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.09,
20
+ "grad_norm": 3.300619125366211,
21
+ "learning_rate": 0.004954128440366973,
22
+ "loss": 1.5287,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.14,
27
+ "grad_norm": 2.6508607864379883,
28
+ "learning_rate": 0.004931192660550459,
29
+ "loss": 1.3837,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.18,
34
+ "grad_norm": 2.1517019271850586,
35
+ "learning_rate": 0.004908256880733945,
36
+ "loss": 1.2375,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.23,
41
+ "grad_norm": 1.7724459171295166,
42
+ "learning_rate": 0.004885321100917431,
43
+ "loss": 1.2391,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.28,
48
+ "grad_norm": 2.2682437896728516,
49
+ "learning_rate": 0.004862385321100918,
50
+ "loss": 1.1129,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.32,
55
+ "grad_norm": 1.4531511068344116,
56
+ "learning_rate": 0.004839449541284404,
57
+ "loss": 1.1138,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.37,
62
+ "grad_norm": 2.0359365940093994,
63
+ "learning_rate": 0.00481651376146789,
64
+ "loss": 1.0654,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.41,
69
+ "grad_norm": 1.8885788917541504,
70
+ "learning_rate": 0.004793577981651377,
71
+ "loss": 1.1075,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.46,
76
+ "grad_norm": 2.3016016483306885,
77
+ "learning_rate": 0.0047706422018348625,
78
+ "loss": 1.075,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.5,
83
+ "grad_norm": 2.251934289932251,
84
+ "learning_rate": 0.004747706422018348,
85
+ "loss": 1.0396,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.55,
90
+ "grad_norm": 1.179795503616333,
91
+ "learning_rate": 0.004724770642201835,
92
+ "loss": 1.036,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.6,
97
+ "grad_norm": 1.6593384742736816,
98
+ "learning_rate": 0.004701834862385321,
99
+ "loss": 1.0118,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.64,
104
+ "grad_norm": 2.6405937671661377,
105
+ "learning_rate": 0.004678899082568808,
106
+ "loss": 0.9888,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.69,
111
+ "grad_norm": 1.8541159629821777,
112
+ "learning_rate": 0.004655963302752294,
113
+ "loss": 0.9314,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.73,
118
+ "grad_norm": 1.5319859981536865,
119
+ "learning_rate": 0.00463302752293578,
120
+ "loss": 0.9473,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.78,
125
+ "grad_norm": 1.5615304708480835,
126
+ "learning_rate": 0.004610091743119266,
127
+ "loss": 0.9362,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.83,
132
+ "grad_norm": 1.7751407623291016,
133
+ "learning_rate": 0.0045871559633027525,
134
+ "loss": 1.0674,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 0.87,
139
+ "grad_norm": 1.6211912631988525,
140
+ "learning_rate": 0.004564220183486238,
141
+ "loss": 1.0766,
142
+ "step": 190
143
+ },
144
+ {
145
+ "epoch": 0.92,
146
+ "grad_norm": 2.1214470863342285,
147
+ "learning_rate": 0.004541284403669725,
148
+ "loss": 0.9675,
149
+ "step": 200
150
+ },
151
+ {
152
+ "epoch": 0.96,
153
+ "grad_norm": 2.0745882987976074,
154
+ "learning_rate": 0.004518348623853211,
155
+ "loss": 0.9804,
156
+ "step": 210
157
+ },
158
+ {
159
+ "epoch": 1.0,
160
+ "eval_accuracy": 0.7243066884176182,
161
+ "eval_f1": 0.6426465488409587,
162
+ "eval_loss": 0.6885228753089905,
163
+ "eval_precision": 0.7882942358945201,
164
+ "eval_recall": 0.6660603565571163,
165
+ "eval_runtime": 20.0858,
166
+ "eval_samples_per_second": 122.076,
167
+ "eval_steps_per_second": 7.667,
168
+ "step": 218
169
+ },
170
+ {
171
+ "epoch": 1.01,
172
+ "grad_norm": 1.6901938915252686,
173
+ "learning_rate": 0.004495412844036698,
174
+ "loss": 1.0215,
175
+ "step": 220
176
+ },
177
+ {
178
+ "epoch": 1.06,
179
+ "grad_norm": 1.575714111328125,
180
+ "learning_rate": 0.004472477064220184,
181
+ "loss": 0.8812,
182
+ "step": 230
183
+ },
184
+ {
185
+ "epoch": 1.1,
186
+ "grad_norm": 2.2109575271606445,
187
+ "learning_rate": 0.0044495412844036695,
188
+ "loss": 0.8834,
189
+ "step": 240
190
+ },
191
+ {
192
+ "epoch": 1.15,
193
+ "grad_norm": 1.79192316532135,
194
+ "learning_rate": 0.004426605504587156,
195
+ "loss": 0.8381,
196
+ "step": 250
197
+ },
198
+ {
199
+ "epoch": 1.19,
200
+ "grad_norm": 2.0963821411132812,
201
+ "learning_rate": 0.004403669724770643,
202
+ "loss": 1.0578,
203
+ "step": 260
204
+ },
205
+ {
206
+ "epoch": 1.24,
207
+ "grad_norm": 1.5217491388320923,
208
+ "learning_rate": 0.004380733944954128,
209
+ "loss": 0.9884,
210
+ "step": 270
211
+ },
212
+ {
213
+ "epoch": 1.28,
214
+ "grad_norm": 1.9401984214782715,
215
+ "learning_rate": 0.004357798165137615,
216
+ "loss": 0.9571,
217
+ "step": 280
218
+ },
219
+ {
220
+ "epoch": 1.33,
221
+ "grad_norm": 2.2098371982574463,
222
+ "learning_rate": 0.0043348623853211015,
223
+ "loss": 0.8079,
224
+ "step": 290
225
+ },
226
+ {
227
+ "epoch": 1.38,
228
+ "grad_norm": 1.31866455078125,
229
+ "learning_rate": 0.004311926605504587,
230
+ "loss": 0.9675,
231
+ "step": 300
232
+ },
233
+ {
234
+ "epoch": 1.42,
235
+ "grad_norm": 1.6946964263916016,
236
+ "learning_rate": 0.004288990825688073,
237
+ "loss": 0.9153,
238
+ "step": 310
239
+ },
240
+ {
241
+ "epoch": 1.47,
242
+ "grad_norm": 1.3618961572647095,
243
+ "learning_rate": 0.0042660550458715595,
244
+ "loss": 0.9402,
245
+ "step": 320
246
+ },
247
+ {
248
+ "epoch": 1.51,
249
+ "grad_norm": 1.3714025020599365,
250
+ "learning_rate": 0.004243119266055046,
251
+ "loss": 0.9622,
252
+ "step": 330
253
+ },
254
+ {
255
+ "epoch": 1.56,
256
+ "grad_norm": 1.4716968536376953,
257
+ "learning_rate": 0.004220183486238533,
258
+ "loss": 0.9493,
259
+ "step": 340
260
+ },
261
+ {
262
+ "epoch": 1.61,
263
+ "grad_norm": 1.8330761194229126,
264
+ "learning_rate": 0.004197247706422018,
265
+ "loss": 0.8914,
266
+ "step": 350
267
+ },
268
+ {
269
+ "epoch": 1.65,
270
+ "grad_norm": 1.2121117115020752,
271
+ "learning_rate": 0.004174311926605505,
272
+ "loss": 0.9925,
273
+ "step": 360
274
+ },
275
+ {
276
+ "epoch": 1.7,
277
+ "grad_norm": 1.6328595876693726,
278
+ "learning_rate": 0.004151376146788991,
279
+ "loss": 0.8127,
280
+ "step": 370
281
+ },
282
+ {
283
+ "epoch": 1.74,
284
+ "grad_norm": 1.02108895778656,
285
+ "learning_rate": 0.004128440366972477,
286
+ "loss": 0.8554,
287
+ "step": 380
288
+ },
289
+ {
290
+ "epoch": 1.79,
291
+ "grad_norm": 1.5831329822540283,
292
+ "learning_rate": 0.004105504587155963,
293
+ "loss": 0.8917,
294
+ "step": 390
295
+ },
296
+ {
297
+ "epoch": 1.83,
298
+ "grad_norm": 1.4080381393432617,
299
+ "learning_rate": 0.00408256880733945,
300
+ "loss": 0.8993,
301
+ "step": 400
302
+ },
303
+ {
304
+ "epoch": 1.88,
305
+ "grad_norm": 1.5948288440704346,
306
+ "learning_rate": 0.004059633027522936,
307
+ "loss": 0.8667,
308
+ "step": 410
309
+ },
310
+ {
311
+ "epoch": 1.93,
312
+ "grad_norm": 1.1762131452560425,
313
+ "learning_rate": 0.004036697247706422,
314
+ "loss": 0.8155,
315
+ "step": 420
316
+ },
317
+ {
318
+ "epoch": 1.97,
319
+ "grad_norm": 1.6054571866989136,
320
+ "learning_rate": 0.0040137614678899085,
321
+ "loss": 0.9277,
322
+ "step": 430
323
+ },
324
+ {
325
+ "epoch": 2.0,
326
+ "eval_accuracy": 0.850326264274062,
327
+ "eval_f1": 0.7680457132974591,
328
+ "eval_loss": 0.35129514336586,
329
+ "eval_precision": 0.7635420667300096,
330
+ "eval_recall": 0.7943474051999218,
331
+ "eval_runtime": 20.1251,
332
+ "eval_samples_per_second": 121.838,
333
+ "eval_steps_per_second": 7.652,
334
+ "step": 436
335
+ },
336
+ {
337
+ "epoch": 2.02,
338
+ "grad_norm": 1.8250489234924316,
339
+ "learning_rate": 0.003990825688073394,
340
+ "loss": 0.8701,
341
+ "step": 440
342
+ },
343
+ {
344
+ "epoch": 2.06,
345
+ "grad_norm": 1.3763368129730225,
346
+ "learning_rate": 0.003967889908256881,
347
+ "loss": 0.8898,
348
+ "step": 450
349
+ },
350
+ {
351
+ "epoch": 2.11,
352
+ "grad_norm": 1.257175326347351,
353
+ "learning_rate": 0.003944954128440367,
354
+ "loss": 0.8269,
355
+ "step": 460
356
+ },
357
+ {
358
+ "epoch": 2.16,
359
+ "grad_norm": 1.4226521253585815,
360
+ "learning_rate": 0.003922018348623853,
361
+ "loss": 0.7581,
362
+ "step": 470
363
+ },
364
+ {
365
+ "epoch": 2.2,
366
+ "grad_norm": 1.8739672899246216,
367
+ "learning_rate": 0.0038990825688073397,
368
+ "loss": 0.8385,
369
+ "step": 480
370
+ },
371
+ {
372
+ "epoch": 2.25,
373
+ "grad_norm": 1.9286770820617676,
374
+ "learning_rate": 0.003876146788990826,
375
+ "loss": 0.8155,
376
+ "step": 490
377
+ },
378
+ {
379
+ "epoch": 2.29,
380
+ "grad_norm": 1.2533843517303467,
381
+ "learning_rate": 0.0038532110091743124,
382
+ "loss": 0.7766,
383
+ "step": 500
384
+ },
385
+ {
386
+ "epoch": 2.34,
387
+ "grad_norm": 1.5577070713043213,
388
+ "learning_rate": 0.003830275229357798,
389
+ "loss": 0.7792,
390
+ "step": 510
391
+ },
392
+ {
393
+ "epoch": 2.39,
394
+ "grad_norm": 1.9237123727798462,
395
+ "learning_rate": 0.0038073394495412843,
396
+ "loss": 0.8341,
397
+ "step": 520
398
+ },
399
+ {
400
+ "epoch": 2.43,
401
+ "grad_norm": 1.455471158027649,
402
+ "learning_rate": 0.003784403669724771,
403
+ "loss": 0.8508,
404
+ "step": 530
405
+ },
406
+ {
407
+ "epoch": 2.48,
408
+ "grad_norm": 1.77620267868042,
409
+ "learning_rate": 0.003761467889908257,
410
+ "loss": 0.8548,
411
+ "step": 540
412
+ },
413
+ {
414
+ "epoch": 2.52,
415
+ "grad_norm": 1.7046033143997192,
416
+ "learning_rate": 0.003738532110091743,
417
+ "loss": 0.7772,
418
+ "step": 550
419
+ },
420
+ {
421
+ "epoch": 2.57,
422
+ "grad_norm": 1.554868459701538,
423
+ "learning_rate": 0.0037155963302752293,
424
+ "loss": 0.7189,
425
+ "step": 560
426
+ },
427
+ {
428
+ "epoch": 2.61,
429
+ "grad_norm": 1.6947243213653564,
430
+ "learning_rate": 0.003692660550458716,
431
+ "loss": 0.8345,
432
+ "step": 570
433
+ },
434
+ {
435
+ "epoch": 2.66,
436
+ "grad_norm": 1.3895587921142578,
437
+ "learning_rate": 0.003669724770642202,
438
+ "loss": 0.758,
439
+ "step": 580
440
+ },
441
+ {
442
+ "epoch": 2.71,
443
+ "grad_norm": 1.3375391960144043,
444
+ "learning_rate": 0.0036467889908256878,
445
+ "loss": 0.7893,
446
+ "step": 590
447
+ },
448
+ {
449
+ "epoch": 2.75,
450
+ "grad_norm": 2.090715169906616,
451
+ "learning_rate": 0.0036238532110091743,
452
+ "loss": 0.795,
453
+ "step": 600
454
+ },
455
+ {
456
+ "epoch": 2.8,
457
+ "grad_norm": 1.4841378927230835,
458
+ "learning_rate": 0.0036009174311926605,
459
+ "loss": 0.8813,
460
+ "step": 610
461
+ },
462
+ {
463
+ "epoch": 2.84,
464
+ "grad_norm": 1.7425097227096558,
465
+ "learning_rate": 0.003577981651376147,
466
+ "loss": 0.8691,
467
+ "step": 620
468
+ },
469
+ {
470
+ "epoch": 2.89,
471
+ "grad_norm": 1.592509388923645,
472
+ "learning_rate": 0.003555045871559633,
473
+ "loss": 0.6923,
474
+ "step": 630
475
+ },
476
+ {
477
+ "epoch": 2.94,
478
+ "grad_norm": 1.178277611732483,
479
+ "learning_rate": 0.0035321100917431194,
480
+ "loss": 0.8297,
481
+ "step": 640
482
+ },
483
+ {
484
+ "epoch": 2.98,
485
+ "grad_norm": 1.2316228151321411,
486
+ "learning_rate": 0.0035091743119266055,
487
+ "loss": 0.8144,
488
+ "step": 650
489
+ },
490
+ {
491
+ "epoch": 3.0,
492
+ "eval_accuracy": 0.8544045676998369,
493
+ "eval_f1": 0.790941601757907,
494
+ "eval_loss": 0.36142271757125854,
495
+ "eval_precision": 0.8330895565628892,
496
+ "eval_recall": 0.7960896326958813,
497
+ "eval_runtime": 20.0968,
498
+ "eval_samples_per_second": 122.01,
499
+ "eval_steps_per_second": 7.663,
500
+ "step": 654
501
+ },
502
+ {
503
+ "epoch": 3.03,
504
+ "grad_norm": 1.9384276866912842,
505
+ "learning_rate": 0.003486238532110092,
506
+ "loss": 0.7581,
507
+ "step": 660
508
+ },
509
+ {
510
+ "epoch": 3.07,
511
+ "grad_norm": 1.494698405265808,
512
+ "learning_rate": 0.003463302752293578,
513
+ "loss": 0.7494,
514
+ "step": 670
515
+ },
516
+ {
517
+ "epoch": 3.12,
518
+ "grad_norm": 1.741979956626892,
519
+ "learning_rate": 0.0034403669724770644,
520
+ "loss": 0.6721,
521
+ "step": 680
522
+ },
523
+ {
524
+ "epoch": 3.17,
525
+ "grad_norm": 1.857740879058838,
526
+ "learning_rate": 0.0034174311926605506,
527
+ "loss": 0.7034,
528
+ "step": 690
529
+ },
530
+ {
531
+ "epoch": 3.21,
532
+ "grad_norm": 1.5749099254608154,
533
+ "learning_rate": 0.003394495412844037,
534
+ "loss": 0.7933,
535
+ "step": 700
536
+ },
537
+ {
538
+ "epoch": 3.26,
539
+ "grad_norm": 1.865283727645874,
540
+ "learning_rate": 0.003371559633027523,
541
+ "loss": 0.7655,
542
+ "step": 710
543
+ },
544
+ {
545
+ "epoch": 3.3,
546
+ "grad_norm": 2.0680484771728516,
547
+ "learning_rate": 0.003348623853211009,
548
+ "loss": 0.7182,
549
+ "step": 720
550
+ },
551
+ {
552
+ "epoch": 3.35,
553
+ "grad_norm": 1.419133186340332,
554
+ "learning_rate": 0.0033256880733944956,
555
+ "loss": 0.7587,
556
+ "step": 730
557
+ },
558
+ {
559
+ "epoch": 3.39,
560
+ "grad_norm": 1.6129292249679565,
561
+ "learning_rate": 0.0033027522935779817,
562
+ "loss": 0.6984,
563
+ "step": 740
564
+ },
565
+ {
566
+ "epoch": 3.44,
567
+ "grad_norm": 1.2033915519714355,
568
+ "learning_rate": 0.003279816513761468,
569
+ "loss": 0.8069,
570
+ "step": 750
571
+ },
572
+ {
573
+ "epoch": 3.49,
574
+ "grad_norm": 1.6685665845870972,
575
+ "learning_rate": 0.003256880733944954,
576
+ "loss": 0.7586,
577
+ "step": 760
578
+ },
579
+ {
580
+ "epoch": 3.53,
581
+ "grad_norm": 1.3577849864959717,
582
+ "learning_rate": 0.0032339449541284406,
583
+ "loss": 0.7577,
584
+ "step": 770
585
+ },
586
+ {
587
+ "epoch": 3.58,
588
+ "grad_norm": 1.4581727981567383,
589
+ "learning_rate": 0.003211009174311927,
590
+ "loss": 0.7977,
591
+ "step": 780
592
+ },
593
+ {
594
+ "epoch": 3.62,
595
+ "grad_norm": 1.547544240951538,
596
+ "learning_rate": 0.0031880733944954125,
597
+ "loss": 0.7738,
598
+ "step": 790
599
+ },
600
+ {
601
+ "epoch": 3.67,
602
+ "grad_norm": 1.6125229597091675,
603
+ "learning_rate": 0.003165137614678899,
604
+ "loss": 0.748,
605
+ "step": 800
606
+ },
607
+ {
608
+ "epoch": 3.72,
609
+ "grad_norm": 1.4292904138565063,
610
+ "learning_rate": 0.0031422018348623852,
611
+ "loss": 0.7275,
612
+ "step": 810
613
+ },
614
+ {
615
+ "epoch": 3.76,
616
+ "grad_norm": 1.8630807399749756,
617
+ "learning_rate": 0.003119266055045872,
618
+ "loss": 0.7082,
619
+ "step": 820
620
+ },
621
+ {
622
+ "epoch": 3.81,
623
+ "grad_norm": 1.2151238918304443,
624
+ "learning_rate": 0.0030963302752293575,
625
+ "loss": 0.7476,
626
+ "step": 830
627
+ },
628
+ {
629
+ "epoch": 3.85,
630
+ "grad_norm": 1.1003532409667969,
631
+ "learning_rate": 0.003073394495412844,
632
+ "loss": 0.6765,
633
+ "step": 840
634
+ },
635
+ {
636
+ "epoch": 3.9,
637
+ "grad_norm": 1.6830847263336182,
638
+ "learning_rate": 0.0030504587155963303,
639
+ "loss": 0.6833,
640
+ "step": 850
641
+ },
642
+ {
643
+ "epoch": 3.94,
644
+ "grad_norm": 1.3484947681427002,
645
+ "learning_rate": 0.003027522935779817,
646
+ "loss": 0.7556,
647
+ "step": 860
648
+ },
649
+ {
650
+ "epoch": 3.99,
651
+ "grad_norm": 1.2535579204559326,
652
+ "learning_rate": 0.0030045871559633026,
653
+ "loss": 0.7344,
654
+ "step": 870
655
+ },
656
+ {
657
+ "epoch": 4.0,
658
+ "eval_accuracy": 0.8609298531810766,
659
+ "eval_f1": 0.7885642715655152,
660
+ "eval_loss": 0.33706870675086975,
661
+ "eval_precision": 0.8326532721424038,
662
+ "eval_recall": 0.8017504308284955,
663
+ "eval_runtime": 20.1745,
664
+ "eval_samples_per_second": 121.54,
665
+ "eval_steps_per_second": 7.633,
666
+ "step": 872
667
+ },
668
+ {
669
+ "epoch": 4.04,
670
+ "grad_norm": 1.1233114004135132,
671
+ "learning_rate": 0.002981651376146789,
672
+ "loss": 0.6551,
673
+ "step": 880
674
+ },
675
+ {
676
+ "epoch": 4.08,
677
+ "grad_norm": 1.2706884145736694,
678
+ "learning_rate": 0.0029587155963302753,
679
+ "loss": 0.7096,
680
+ "step": 890
681
+ },
682
+ {
683
+ "epoch": 4.13,
684
+ "grad_norm": 1.4524619579315186,
685
+ "learning_rate": 0.002935779816513762,
686
+ "loss": 0.7413,
687
+ "step": 900
688
+ },
689
+ {
690
+ "epoch": 4.17,
691
+ "grad_norm": 1.3791077136993408,
692
+ "learning_rate": 0.0029128440366972476,
693
+ "loss": 0.7428,
694
+ "step": 910
695
+ },
696
+ {
697
+ "epoch": 4.22,
698
+ "grad_norm": 1.3151274919509888,
699
+ "learning_rate": 0.0028899082568807338,
700
+ "loss": 0.7053,
701
+ "step": 920
702
+ },
703
+ {
704
+ "epoch": 4.27,
705
+ "grad_norm": 1.2521573305130005,
706
+ "learning_rate": 0.0028669724770642203,
707
+ "loss": 0.6814,
708
+ "step": 930
709
+ },
710
+ {
711
+ "epoch": 4.31,
712
+ "grad_norm": 1.200779676437378,
713
+ "learning_rate": 0.0028440366972477065,
714
+ "loss": 0.6757,
715
+ "step": 940
716
+ },
717
+ {
718
+ "epoch": 4.36,
719
+ "grad_norm": 1.3665342330932617,
720
+ "learning_rate": 0.0028211009174311926,
721
+ "loss": 0.677,
722
+ "step": 950
723
+ },
724
+ {
725
+ "epoch": 4.4,
726
+ "grad_norm": 1.4855738878250122,
727
+ "learning_rate": 0.002798165137614679,
728
+ "loss": 0.6805,
729
+ "step": 960
730
+ },
731
+ {
732
+ "epoch": 4.45,
733
+ "grad_norm": 1.2765145301818848,
734
+ "learning_rate": 0.0027752293577981654,
735
+ "loss": 0.6568,
736
+ "step": 970
737
+ },
738
+ {
739
+ "epoch": 4.5,
740
+ "grad_norm": 1.2457036972045898,
741
+ "learning_rate": 0.0027522935779816515,
742
+ "loss": 0.7198,
743
+ "step": 980
744
+ },
745
+ {
746
+ "epoch": 4.54,
747
+ "grad_norm": 1.3267652988433838,
748
+ "learning_rate": 0.0027293577981651372,
749
+ "loss": 0.6578,
750
+ "step": 990
751
+ },
752
+ {
753
+ "epoch": 4.59,
754
+ "grad_norm": 1.409703016281128,
755
+ "learning_rate": 0.002706422018348624,
756
+ "loss": 0.695,
757
+ "step": 1000
758
+ },
759
+ {
760
+ "epoch": 4.63,
761
+ "grad_norm": 1.089101791381836,
762
+ "learning_rate": 0.00268348623853211,
763
+ "loss": 0.6934,
764
+ "step": 1010
765
+ },
766
+ {
767
+ "epoch": 4.68,
768
+ "grad_norm": 1.238553762435913,
769
+ "learning_rate": 0.0026605504587155966,
770
+ "loss": 0.6932,
771
+ "step": 1020
772
+ },
773
+ {
774
+ "epoch": 4.72,
775
+ "grad_norm": 1.3457752466201782,
776
+ "learning_rate": 0.0026376146788990823,
777
+ "loss": 0.7615,
778
+ "step": 1030
779
+ },
780
+ {
781
+ "epoch": 4.77,
782
+ "grad_norm": 1.3853940963745117,
783
+ "learning_rate": 0.002614678899082569,
784
+ "loss": 0.7032,
785
+ "step": 1040
786
+ },
787
+ {
788
+ "epoch": 4.82,
789
+ "grad_norm": 1.5760701894760132,
790
+ "learning_rate": 0.002591743119266055,
791
+ "loss": 0.684,
792
+ "step": 1050
793
+ },
794
+ {
795
+ "epoch": 4.86,
796
+ "grad_norm": 1.2746469974517822,
797
+ "learning_rate": 0.0025688073394495416,
798
+ "loss": 0.7168,
799
+ "step": 1060
800
+ },
801
+ {
802
+ "epoch": 4.91,
803
+ "grad_norm": 1.0423808097839355,
804
+ "learning_rate": 0.0025458715596330273,
805
+ "loss": 0.7036,
806
+ "step": 1070
807
+ },
808
+ {
809
+ "epoch": 4.95,
810
+ "grad_norm": 1.255771279335022,
811
+ "learning_rate": 0.0025229357798165135,
812
+ "loss": 0.7069,
813
+ "step": 1080
814
+ },
815
+ {
816
+ "epoch": 5.0,
817
+ "grad_norm": 1.7328591346740723,
818
+ "learning_rate": 0.0025,
819
+ "loss": 0.7181,
820
+ "step": 1090
821
+ },
822
+ {
823
+ "epoch": 5.0,
824
+ "eval_accuracy": 0.8923327895595432,
825
+ "eval_f1": 0.8095838184642776,
826
+ "eval_loss": 0.2933848798274994,
827
+ "eval_precision": 0.8060491329782299,
828
+ "eval_recall": 0.8389422669715082,
829
+ "eval_runtime": 20.1372,
830
+ "eval_samples_per_second": 121.765,
831
+ "eval_steps_per_second": 7.648,
832
+ "step": 1090
833
+ },
834
+ {
835
+ "epoch": 5.05,
836
+ "grad_norm": 1.0849229097366333,
837
+ "learning_rate": 0.0024770642201834866,
838
+ "loss": 0.6191,
839
+ "step": 1100
840
+ },
841
+ {
842
+ "epoch": 5.09,
843
+ "grad_norm": 1.3102843761444092,
844
+ "learning_rate": 0.0024541284403669724,
845
+ "loss": 0.6732,
846
+ "step": 1110
847
+ },
848
+ {
849
+ "epoch": 5.14,
850
+ "grad_norm": 1.0374155044555664,
851
+ "learning_rate": 0.002431192660550459,
852
+ "loss": 0.654,
853
+ "step": 1120
854
+ },
855
+ {
856
+ "epoch": 5.18,
857
+ "grad_norm": 2.718107223510742,
858
+ "learning_rate": 0.002408256880733945,
859
+ "loss": 0.5944,
860
+ "step": 1130
861
+ },
862
+ {
863
+ "epoch": 5.23,
864
+ "grad_norm": 1.9114854335784912,
865
+ "learning_rate": 0.0023853211009174312,
866
+ "loss": 0.6039,
867
+ "step": 1140
868
+ },
869
+ {
870
+ "epoch": 5.28,
871
+ "grad_norm": 1.1414576768875122,
872
+ "learning_rate": 0.0023623853211009174,
873
+ "loss": 0.6064,
874
+ "step": 1150
875
+ },
876
+ {
877
+ "epoch": 5.32,
878
+ "grad_norm": 1.319360613822937,
879
+ "learning_rate": 0.002339449541284404,
880
+ "loss": 0.6335,
881
+ "step": 1160
882
+ },
883
+ {
884
+ "epoch": 5.37,
885
+ "grad_norm": 1.377007246017456,
886
+ "learning_rate": 0.00231651376146789,
887
+ "loss": 0.6431,
888
+ "step": 1170
889
+ },
890
+ {
891
+ "epoch": 5.41,
892
+ "grad_norm": 1.3753886222839355,
893
+ "learning_rate": 0.0022935779816513763,
894
+ "loss": 0.6594,
895
+ "step": 1180
896
+ },
897
+ {
898
+ "epoch": 5.46,
899
+ "grad_norm": 1.0963853597640991,
900
+ "learning_rate": 0.0022706422018348624,
901
+ "loss": 0.6999,
902
+ "step": 1190
903
+ },
904
+ {
905
+ "epoch": 5.5,
906
+ "grad_norm": 1.3119159936904907,
907
+ "learning_rate": 0.002247706422018349,
908
+ "loss": 0.5879,
909
+ "step": 1200
910
+ },
911
+ {
912
+ "epoch": 5.55,
913
+ "grad_norm": 1.000196099281311,
914
+ "learning_rate": 0.0022247706422018347,
915
+ "loss": 0.6559,
916
+ "step": 1210
917
+ },
918
+ {
919
+ "epoch": 5.6,
920
+ "grad_norm": 1.1916228532791138,
921
+ "learning_rate": 0.0022018348623853213,
922
+ "loss": 0.6322,
923
+ "step": 1220
924
+ },
925
+ {
926
+ "epoch": 5.64,
927
+ "grad_norm": 1.3752835988998413,
928
+ "learning_rate": 0.0021788990825688075,
929
+ "loss": 0.6684,
930
+ "step": 1230
931
+ },
932
+ {
933
+ "epoch": 5.69,
934
+ "grad_norm": 1.3724400997161865,
935
+ "learning_rate": 0.0021559633027522936,
936
+ "loss": 0.6126,
937
+ "step": 1240
938
+ },
939
+ {
940
+ "epoch": 5.73,
941
+ "grad_norm": 1.1628080606460571,
942
+ "learning_rate": 0.0021330275229357798,
943
+ "loss": 0.6358,
944
+ "step": 1250
945
+ },
946
+ {
947
+ "epoch": 5.78,
948
+ "grad_norm": 1.222913384437561,
949
+ "learning_rate": 0.0021100917431192663,
950
+ "loss": 0.6124,
951
+ "step": 1260
952
+ },
953
+ {
954
+ "epoch": 5.83,
955
+ "grad_norm": 1.2353663444519043,
956
+ "learning_rate": 0.0020871559633027525,
957
+ "loss": 0.688,
958
+ "step": 1270
959
+ },
960
+ {
961
+ "epoch": 5.87,
962
+ "grad_norm": 1.0336887836456299,
963
+ "learning_rate": 0.0020642201834862386,
964
+ "loss": 0.6118,
965
+ "step": 1280
966
+ },
967
+ {
968
+ "epoch": 5.92,
969
+ "grad_norm": 0.934916079044342,
970
+ "learning_rate": 0.002041284403669725,
971
+ "loss": 0.6061,
972
+ "step": 1290
973
+ },
974
+ {
975
+ "epoch": 5.96,
976
+ "grad_norm": 1.282327651977539,
977
+ "learning_rate": 0.002018348623853211,
978
+ "loss": 0.5857,
979
+ "step": 1300
980
+ },
981
+ {
982
+ "epoch": 6.0,
983
+ "eval_accuracy": 0.8858075040783034,
984
+ "eval_f1": 0.8314901516993909,
985
+ "eval_loss": 0.2926943302154541,
986
+ "eval_precision": 0.8493368689933978,
987
+ "eval_recall": 0.8358247845047836,
988
+ "eval_runtime": 20.132,
989
+ "eval_samples_per_second": 121.796,
990
+ "eval_steps_per_second": 7.65,
991
+ "step": 1308
992
+ },
993
+ {
994
+ "epoch": 6.01,
995
+ "grad_norm": 1.3109623193740845,
996
+ "learning_rate": 0.001995412844036697,
997
+ "loss": 0.6379,
998
+ "step": 1310
999
+ },
1000
+ {
1001
+ "epoch": 6.06,
1002
+ "grad_norm": 1.0487977266311646,
1003
+ "learning_rate": 0.0019724770642201837,
1004
+ "loss": 0.6469,
1005
+ "step": 1320
1006
+ },
1007
+ {
1008
+ "epoch": 6.1,
1009
+ "grad_norm": 1.1113476753234863,
1010
+ "learning_rate": 0.0019495412844036698,
1011
+ "loss": 0.6182,
1012
+ "step": 1330
1013
+ },
1014
+ {
1015
+ "epoch": 6.15,
1016
+ "grad_norm": 1.2381951808929443,
1017
+ "learning_rate": 0.0019266055045871562,
1018
+ "loss": 0.6625,
1019
+ "step": 1340
1020
+ },
1021
+ {
1022
+ "epoch": 6.19,
1023
+ "grad_norm": 1.175887107849121,
1024
+ "learning_rate": 0.0019036697247706421,
1025
+ "loss": 0.6267,
1026
+ "step": 1350
1027
+ },
1028
+ {
1029
+ "epoch": 6.24,
1030
+ "grad_norm": 0.798713743686676,
1031
+ "learning_rate": 0.0018807339449541285,
1032
+ "loss": 0.5674,
1033
+ "step": 1360
1034
+ },
1035
+ {
1036
+ "epoch": 6.28,
1037
+ "grad_norm": 0.9393543004989624,
1038
+ "learning_rate": 0.0018577981651376147,
1039
+ "loss": 0.5958,
1040
+ "step": 1370
1041
+ },
1042
+ {
1043
+ "epoch": 6.33,
1044
+ "grad_norm": 1.7909319400787354,
1045
+ "learning_rate": 0.001834862385321101,
1046
+ "loss": 0.5627,
1047
+ "step": 1380
1048
+ },
1049
+ {
1050
+ "epoch": 6.38,
1051
+ "grad_norm": 1.0835124254226685,
1052
+ "learning_rate": 0.0018119266055045872,
1053
+ "loss": 0.5894,
1054
+ "step": 1390
1055
+ },
1056
+ {
1057
+ "epoch": 6.42,
1058
+ "grad_norm": 1.349327802658081,
1059
+ "learning_rate": 0.0017889908256880735,
1060
+ "loss": 0.5777,
1061
+ "step": 1400
1062
+ },
1063
+ {
1064
+ "epoch": 6.47,
1065
+ "grad_norm": 0.8337296843528748,
1066
+ "learning_rate": 0.0017660550458715597,
1067
+ "loss": 0.5706,
1068
+ "step": 1410
1069
+ },
1070
+ {
1071
+ "epoch": 6.51,
1072
+ "grad_norm": 1.2801483869552612,
1073
+ "learning_rate": 0.001743119266055046,
1074
+ "loss": 0.5262,
1075
+ "step": 1420
1076
+ },
1077
+ {
1078
+ "epoch": 6.56,
1079
+ "grad_norm": 1.4153425693511963,
1080
+ "learning_rate": 0.0017201834862385322,
1081
+ "loss": 0.5377,
1082
+ "step": 1430
1083
+ },
1084
+ {
1085
+ "epoch": 6.61,
1086
+ "grad_norm": 1.23189377784729,
1087
+ "learning_rate": 0.0016972477064220186,
1088
+ "loss": 0.6018,
1089
+ "step": 1440
1090
+ },
1091
+ {
1092
+ "epoch": 6.65,
1093
+ "grad_norm": 1.4015734195709229,
1094
+ "learning_rate": 0.0016743119266055045,
1095
+ "loss": 0.5903,
1096
+ "step": 1450
1097
+ },
1098
+ {
1099
+ "epoch": 6.7,
1100
+ "grad_norm": 1.4871268272399902,
1101
+ "learning_rate": 0.0016513761467889909,
1102
+ "loss": 0.562,
1103
+ "step": 1460
1104
+ },
1105
+ {
1106
+ "epoch": 6.74,
1107
+ "grad_norm": 1.0915515422821045,
1108
+ "learning_rate": 0.001628440366972477,
1109
+ "loss": 0.552,
1110
+ "step": 1470
1111
+ },
1112
+ {
1113
+ "epoch": 6.79,
1114
+ "grad_norm": 1.5078574419021606,
1115
+ "learning_rate": 0.0016055045871559634,
1116
+ "loss": 0.4712,
1117
+ "step": 1480
1118
+ },
1119
+ {
1120
+ "epoch": 6.83,
1121
+ "grad_norm": 1.771911859512329,
1122
+ "learning_rate": 0.0015825688073394495,
1123
+ "loss": 0.5985,
1124
+ "step": 1490
1125
+ },
1126
+ {
1127
+ "epoch": 6.88,
1128
+ "grad_norm": 1.1326086521148682,
1129
+ "learning_rate": 0.001559633027522936,
1130
+ "loss": 0.6038,
1131
+ "step": 1500
1132
+ },
1133
+ {
1134
+ "epoch": 6.93,
1135
+ "grad_norm": 0.9454457759857178,
1136
+ "learning_rate": 0.001536697247706422,
1137
+ "loss": 0.5471,
1138
+ "step": 1510
1139
+ },
1140
+ {
1141
+ "epoch": 6.97,
1142
+ "grad_norm": 1.146789312362671,
1143
+ "learning_rate": 0.0015137614678899084,
1144
+ "loss": 0.5607,
1145
+ "step": 1520
1146
+ },
1147
+ {
1148
+ "epoch": 7.0,
1149
+ "eval_accuracy": 0.9061990212071778,
1150
+ "eval_f1": 0.8415533276455389,
1151
+ "eval_loss": 0.22090552747249603,
1152
+ "eval_precision": 0.8658073048818948,
1153
+ "eval_recall": 0.854663440247606,
1154
+ "eval_runtime": 20.17,
1155
+ "eval_samples_per_second": 121.567,
1156
+ "eval_steps_per_second": 7.635,
1157
+ "step": 1526
1158
+ },
1159
+ {
1160
+ "epoch": 7.02,
1161
+ "grad_norm": 1.0613855123519897,
1162
+ "learning_rate": 0.0014908256880733946,
1163
+ "loss": 0.5094,
1164
+ "step": 1530
1165
+ },
1166
+ {
1167
+ "epoch": 7.06,
1168
+ "grad_norm": 0.9999263286590576,
1169
+ "learning_rate": 0.001467889908256881,
1170
+ "loss": 0.5519,
1171
+ "step": 1540
1172
+ },
1173
+ {
1174
+ "epoch": 7.11,
1175
+ "grad_norm": 1.2755348682403564,
1176
+ "learning_rate": 0.0014449541284403669,
1177
+ "loss": 0.5626,
1178
+ "step": 1550
1179
+ },
1180
+ {
1181
+ "epoch": 7.16,
1182
+ "grad_norm": 1.0304380655288696,
1183
+ "learning_rate": 0.0014220183486238532,
1184
+ "loss": 0.5547,
1185
+ "step": 1560
1186
+ },
1187
+ {
1188
+ "epoch": 7.2,
1189
+ "grad_norm": 1.4103654623031616,
1190
+ "learning_rate": 0.0013990825688073394,
1191
+ "loss": 0.5487,
1192
+ "step": 1570
1193
+ },
1194
+ {
1195
+ "epoch": 7.25,
1196
+ "grad_norm": 0.967018723487854,
1197
+ "learning_rate": 0.0013761467889908258,
1198
+ "loss": 0.5428,
1199
+ "step": 1580
1200
+ },
1201
+ {
1202
+ "epoch": 7.29,
1203
+ "grad_norm": 1.2113174200057983,
1204
+ "learning_rate": 0.001353211009174312,
1205
+ "loss": 0.5207,
1206
+ "step": 1590
1207
+ },
1208
+ {
1209
+ "epoch": 7.34,
1210
+ "grad_norm": 1.2477692365646362,
1211
+ "learning_rate": 0.0013302752293577983,
1212
+ "loss": 0.5647,
1213
+ "step": 1600
1214
+ },
1215
+ {
1216
+ "epoch": 7.39,
1217
+ "grad_norm": 0.9783982038497925,
1218
+ "learning_rate": 0.0013073394495412844,
1219
+ "loss": 0.5262,
1220
+ "step": 1610
1221
+ },
1222
+ {
1223
+ "epoch": 7.43,
1224
+ "grad_norm": 1.3188928365707397,
1225
+ "learning_rate": 0.0012844036697247708,
1226
+ "loss": 0.5012,
1227
+ "step": 1620
1228
+ },
1229
+ {
1230
+ "epoch": 7.48,
1231
+ "grad_norm": 1.1862777471542358,
1232
+ "learning_rate": 0.0012614678899082567,
1233
+ "loss": 0.5135,
1234
+ "step": 1630
1235
+ },
1236
+ {
1237
+ "epoch": 7.52,
1238
+ "grad_norm": 0.9528157114982605,
1239
+ "learning_rate": 0.0012385321100917433,
1240
+ "loss": 0.554,
1241
+ "step": 1640
1242
+ },
1243
+ {
1244
+ "epoch": 7.57,
1245
+ "grad_norm": 1.229379653930664,
1246
+ "learning_rate": 0.0012155963302752295,
1247
+ "loss": 0.4834,
1248
+ "step": 1650
1249
+ },
1250
+ {
1251
+ "epoch": 7.61,
1252
+ "grad_norm": 1.2559857368469238,
1253
+ "learning_rate": 0.0011926605504587156,
1254
+ "loss": 0.5633,
1255
+ "step": 1660
1256
+ },
1257
+ {
1258
+ "epoch": 7.66,
1259
+ "grad_norm": 1.423509120941162,
1260
+ "learning_rate": 0.001169724770642202,
1261
+ "loss": 0.5654,
1262
+ "step": 1670
1263
+ },
1264
+ {
1265
+ "epoch": 7.71,
1266
+ "grad_norm": 1.0073614120483398,
1267
+ "learning_rate": 0.0011467889908256881,
1268
+ "loss": 0.555,
1269
+ "step": 1680
1270
+ },
1271
+ {
1272
+ "epoch": 7.75,
1273
+ "grad_norm": 0.8332647085189819,
1274
+ "learning_rate": 0.0011238532110091745,
1275
+ "loss": 0.5221,
1276
+ "step": 1690
1277
+ },
1278
+ {
1279
+ "epoch": 7.8,
1280
+ "grad_norm": 1.2242189645767212,
1281
+ "learning_rate": 0.0011009174311926607,
1282
+ "loss": 0.5209,
1283
+ "step": 1700
1284
+ },
1285
+ {
1286
+ "epoch": 7.84,
1287
+ "grad_norm": 1.2133524417877197,
1288
+ "learning_rate": 0.0010779816513761468,
1289
+ "loss": 0.4905,
1290
+ "step": 1710
1291
+ },
1292
+ {
1293
+ "epoch": 7.89,
1294
+ "grad_norm": 1.3106974363327026,
1295
+ "learning_rate": 0.0010550458715596332,
1296
+ "loss": 0.5018,
1297
+ "step": 1720
1298
+ },
1299
+ {
1300
+ "epoch": 7.94,
1301
+ "grad_norm": 1.1411136388778687,
1302
+ "learning_rate": 0.0010321100917431193,
1303
+ "loss": 0.5594,
1304
+ "step": 1730
1305
+ },
1306
+ {
1307
+ "epoch": 7.98,
1308
+ "grad_norm": 1.3002750873565674,
1309
+ "learning_rate": 0.0010091743119266055,
1310
+ "loss": 0.5423,
1311
+ "step": 1740
1312
+ },
1313
+ {
1314
+ "epoch": 8.0,
1315
+ "eval_accuracy": 0.9025285481239804,
1316
+ "eval_f1": 0.8487271486984757,
1317
+ "eval_loss": 0.2513488829135895,
1318
+ "eval_precision": 0.854490909662593,
1319
+ "eval_recall": 0.847020693577074,
1320
+ "eval_runtime": 20.1419,
1321
+ "eval_samples_per_second": 121.736,
1322
+ "eval_steps_per_second": 7.646,
1323
+ "step": 1744
1324
+ },
1325
+ {
1326
+ "epoch": 8.03,
1327
+ "grad_norm": 0.9419348835945129,
1328
+ "learning_rate": 0.0009862385321100918,
1329
+ "loss": 0.4677,
1330
+ "step": 1750
1331
+ },
1332
+ {
1333
+ "epoch": 8.07,
1334
+ "grad_norm": 1.2686134576797485,
1335
+ "learning_rate": 0.0009633027522935781,
1336
+ "loss": 0.5112,
1337
+ "step": 1760
1338
+ },
1339
+ {
1340
+ "epoch": 8.12,
1341
+ "grad_norm": 1.0132619142532349,
1342
+ "learning_rate": 0.0009403669724770643,
1343
+ "loss": 0.4776,
1344
+ "step": 1770
1345
+ },
1346
+ {
1347
+ "epoch": 8.17,
1348
+ "grad_norm": 1.5143158435821533,
1349
+ "learning_rate": 0.0009174311926605505,
1350
+ "loss": 0.4824,
1351
+ "step": 1780
1352
+ },
1353
+ {
1354
+ "epoch": 8.21,
1355
+ "grad_norm": 0.9703628420829773,
1356
+ "learning_rate": 0.0008944954128440368,
1357
+ "loss": 0.5088,
1358
+ "step": 1790
1359
+ },
1360
+ {
1361
+ "epoch": 8.26,
1362
+ "grad_norm": 1.054370403289795,
1363
+ "learning_rate": 0.000871559633027523,
1364
+ "loss": 0.5356,
1365
+ "step": 1800
1366
+ },
1367
+ {
1368
+ "epoch": 8.3,
1369
+ "grad_norm": 1.4087867736816406,
1370
+ "learning_rate": 0.0008486238532110093,
1371
+ "loss": 0.4931,
1372
+ "step": 1810
1373
+ },
1374
+ {
1375
+ "epoch": 8.35,
1376
+ "grad_norm": 1.2010319232940674,
1377
+ "learning_rate": 0.0008256880733944954,
1378
+ "loss": 0.457,
1379
+ "step": 1820
1380
+ },
1381
+ {
1382
+ "epoch": 8.39,
1383
+ "grad_norm": 0.9890044927597046,
1384
+ "learning_rate": 0.0008027522935779817,
1385
+ "loss": 0.4841,
1386
+ "step": 1830
1387
+ },
1388
+ {
1389
+ "epoch": 8.44,
1390
+ "grad_norm": 1.102015733718872,
1391
+ "learning_rate": 0.000779816513761468,
1392
+ "loss": 0.5156,
1393
+ "step": 1840
1394
+ },
1395
+ {
1396
+ "epoch": 8.49,
1397
+ "grad_norm": 1.8185703754425049,
1398
+ "learning_rate": 0.0007568807339449542,
1399
+ "loss": 0.5208,
1400
+ "step": 1850
1401
+ },
1402
+ {
1403
+ "epoch": 8.53,
1404
+ "grad_norm": 1.0316917896270752,
1405
+ "learning_rate": 0.0007339449541284405,
1406
+ "loss": 0.4669,
1407
+ "step": 1860
1408
+ },
1409
+ {
1410
+ "epoch": 8.58,
1411
+ "grad_norm": 1.3456588983535767,
1412
+ "learning_rate": 0.0007110091743119266,
1413
+ "loss": 0.5149,
1414
+ "step": 1870
1415
+ },
1416
+ {
1417
+ "epoch": 8.62,
1418
+ "grad_norm": 1.1594740152359009,
1419
+ "learning_rate": 0.0006880733944954129,
1420
+ "loss": 0.4816,
1421
+ "step": 1880
1422
+ },
1423
+ {
1424
+ "epoch": 8.67,
1425
+ "grad_norm": 1.195693850517273,
1426
+ "learning_rate": 0.0006651376146788991,
1427
+ "loss": 0.4264,
1428
+ "step": 1890
1429
+ },
1430
+ {
1431
+ "epoch": 8.72,
1432
+ "grad_norm": 0.8687453269958496,
1433
+ "learning_rate": 0.0006422018348623854,
1434
+ "loss": 0.5055,
1435
+ "step": 1900
1436
+ },
1437
+ {
1438
+ "epoch": 8.76,
1439
+ "grad_norm": 1.019943118095398,
1440
+ "learning_rate": 0.0006192660550458717,
1441
+ "loss": 0.4862,
1442
+ "step": 1910
1443
+ },
1444
+ {
1445
+ "epoch": 8.81,
1446
+ "grad_norm": 1.0585637092590332,
1447
+ "learning_rate": 0.0005963302752293578,
1448
+ "loss": 0.4698,
1449
+ "step": 1920
1450
+ },
1451
+ {
1452
+ "epoch": 8.85,
1453
+ "grad_norm": 1.1640921831130981,
1454
+ "learning_rate": 0.0005733944954128441,
1455
+ "loss": 0.4725,
1456
+ "step": 1930
1457
+ },
1458
+ {
1459
+ "epoch": 8.9,
1460
+ "grad_norm": 1.0359253883361816,
1461
+ "learning_rate": 0.0005504587155963303,
1462
+ "loss": 0.4677,
1463
+ "step": 1940
1464
+ },
1465
+ {
1466
+ "epoch": 8.94,
1467
+ "grad_norm": 0.795647382736206,
1468
+ "learning_rate": 0.0005275229357798166,
1469
+ "loss": 0.4597,
1470
+ "step": 1950
1471
+ },
1472
+ {
1473
+ "epoch": 8.99,
1474
+ "grad_norm": 0.8944999575614929,
1475
+ "learning_rate": 0.0005045871559633027,
1476
+ "loss": 0.4053,
1477
+ "step": 1960
1478
+ },
1479
+ {
1480
+ "epoch": 9.0,
1481
+ "eval_accuracy": 0.9037520391517129,
1482
+ "eval_f1": 0.8372664360619215,
1483
+ "eval_loss": 0.256144642829895,
1484
+ "eval_precision": 0.8543098842069452,
1485
+ "eval_recall": 0.8456765717856151,
1486
+ "eval_runtime": 20.1301,
1487
+ "eval_samples_per_second": 121.808,
1488
+ "eval_steps_per_second": 7.65,
1489
+ "step": 1962
1490
+ },
1491
+ {
1492
+ "epoch": 9.04,
1493
+ "grad_norm": 0.7914834022521973,
1494
+ "learning_rate": 0.00048165137614678905,
1495
+ "loss": 0.5022,
1496
+ "step": 1970
1497
+ },
1498
+ {
1499
+ "epoch": 9.08,
1500
+ "grad_norm": 1.254602313041687,
1501
+ "learning_rate": 0.00045871559633027525,
1502
+ "loss": 0.4237,
1503
+ "step": 1980
1504
+ },
1505
+ {
1506
+ "epoch": 9.13,
1507
+ "grad_norm": 1.0266820192337036,
1508
+ "learning_rate": 0.0004357798165137615,
1509
+ "loss": 0.4579,
1510
+ "step": 1990
1511
+ },
1512
+ {
1513
+ "epoch": 9.17,
1514
+ "grad_norm": 0.948698103427887,
1515
+ "learning_rate": 0.0004128440366972477,
1516
+ "loss": 0.4518,
1517
+ "step": 2000
1518
+ },
1519
+ {
1520
+ "epoch": 9.22,
1521
+ "grad_norm": 1.4235280752182007,
1522
+ "learning_rate": 0.000389908256880734,
1523
+ "loss": 0.4189,
1524
+ "step": 2010
1525
+ },
1526
+ {
1527
+ "epoch": 9.27,
1528
+ "grad_norm": 1.2883589267730713,
1529
+ "learning_rate": 0.00036697247706422024,
1530
+ "loss": 0.4441,
1531
+ "step": 2020
1532
+ },
1533
+ {
1534
+ "epoch": 9.31,
1535
+ "grad_norm": 1.0829825401306152,
1536
+ "learning_rate": 0.00034403669724770644,
1537
+ "loss": 0.3735,
1538
+ "step": 2030
1539
+ },
1540
+ {
1541
+ "epoch": 9.36,
1542
+ "grad_norm": 1.0987470149993896,
1543
+ "learning_rate": 0.0003211009174311927,
1544
+ "loss": 0.4356,
1545
+ "step": 2040
1546
+ },
1547
+ {
1548
+ "epoch": 9.4,
1549
+ "grad_norm": 1.0250358581542969,
1550
+ "learning_rate": 0.0002981651376146789,
1551
+ "loss": 0.4606,
1552
+ "step": 2050
1553
+ },
1554
+ {
1555
+ "epoch": 9.45,
1556
+ "grad_norm": 0.8266171216964722,
1557
+ "learning_rate": 0.00027522935779816516,
1558
+ "loss": 0.4216,
1559
+ "step": 2060
1560
+ },
1561
+ {
1562
+ "epoch": 9.5,
1563
+ "grad_norm": 0.7694286108016968,
1564
+ "learning_rate": 0.00025229357798165137,
1565
+ "loss": 0.4054,
1566
+ "step": 2070
1567
+ },
1568
+ {
1569
+ "epoch": 9.54,
1570
+ "grad_norm": 1.0557419061660767,
1571
+ "learning_rate": 0.00022935779816513763,
1572
+ "loss": 0.4145,
1573
+ "step": 2080
1574
+ },
1575
+ {
1576
+ "epoch": 9.59,
1577
+ "grad_norm": 1.1996885538101196,
1578
+ "learning_rate": 0.00020871559633027525,
1579
+ "loss": 0.4446,
1580
+ "step": 2090
1581
+ },
1582
+ {
1583
+ "epoch": 9.63,
1584
+ "grad_norm": 0.971155047416687,
1585
+ "learning_rate": 0.00018577981651376148,
1586
+ "loss": 0.4329,
1587
+ "step": 2100
1588
+ },
1589
+ {
1590
+ "epoch": 9.68,
1591
+ "grad_norm": 1.243017554283142,
1592
+ "learning_rate": 0.0001628440366972477,
1593
+ "loss": 0.386,
1594
+ "step": 2110
1595
+ },
1596
+ {
1597
+ "epoch": 9.72,
1598
+ "grad_norm": 0.9568091034889221,
1599
+ "learning_rate": 0.00013990825688073395,
1600
+ "loss": 0.4192,
1601
+ "step": 2120
1602
+ },
1603
+ {
1604
+ "epoch": 9.77,
1605
+ "grad_norm": 0.8637120723724365,
1606
+ "learning_rate": 0.00011697247706422019,
1607
+ "loss": 0.414,
1608
+ "step": 2130
1609
+ },
1610
+ {
1611
+ "epoch": 9.82,
1612
+ "grad_norm": 1.4437438249588013,
1613
+ "learning_rate": 9.403669724770644e-05,
1614
+ "loss": 0.4716,
1615
+ "step": 2140
1616
+ },
1617
+ {
1618
+ "epoch": 9.86,
1619
+ "grad_norm": 0.9072945713996887,
1620
+ "learning_rate": 7.110091743119267e-05,
1621
+ "loss": 0.4339,
1622
+ "step": 2150
1623
+ },
1624
+ {
1625
+ "epoch": 9.91,
1626
+ "grad_norm": 1.0016337633132935,
1627
+ "learning_rate": 4.81651376146789e-05,
1628
+ "loss": 0.4118,
1629
+ "step": 2160
1630
+ },
1631
+ {
1632
+ "epoch": 9.95,
1633
+ "grad_norm": 1.1547799110412598,
1634
+ "learning_rate": 2.5229357798165138e-05,
1635
+ "loss": 0.4367,
1636
+ "step": 2170
1637
+ },
1638
+ {
1639
+ "epoch": 10.0,
1640
+ "grad_norm": 1.1134579181671143,
1641
+ "learning_rate": 2.2935779816513764e-06,
1642
+ "loss": 0.4417,
1643
+ "step": 2180
1644
+ },
1645
+ {
1646
+ "epoch": 10.0,
1647
+ "eval_accuracy": 0.899673735725938,
1648
+ "eval_f1": 0.8415790365990568,
1649
+ "eval_loss": 0.2557845115661621,
1650
+ "eval_precision": 0.8463460685420455,
1651
+ "eval_recall": 0.8395151187215174,
1652
+ "eval_runtime": 20.1334,
1653
+ "eval_samples_per_second": 121.788,
1654
+ "eval_steps_per_second": 7.649,
1655
+ "step": 2180
1656
+ },
1657
+ {
1658
+ "epoch": 10.0,
1659
+ "step": 2180,
1660
+ "total_flos": 1.0878579515820442e+19,
1661
+ "train_loss": 0.69453261346992,
1662
+ "train_runtime": 2383.9912,
1663
+ "train_samples_per_second": 58.473,
1664
+ "train_steps_per_second": 0.914
1665
+ }
1666
+ ],
1667
+ "logging_steps": 10,
1668
+ "max_steps": 2180,
1669
+ "num_input_tokens_seen": 0,
1670
+ "num_train_epochs": 10,
1671
+ "save_steps": 500,
1672
+ "total_flos": 1.0878579515820442e+19,
1673
+ "train_batch_size": 16,
1674
+ "trial_name": null,
1675
+ "trial_params": null
1676
+ }