sharren commited on
Commit
4472b74
1 Parent(s): 0737bea

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224
4
  tags:
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
@@ -18,13 +19,13 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  # vit-lr-inverse-sqrt
20
 
21
- This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 0.6353
24
- - Accuracy: 0.8755
25
- - Precision: 0.8751
26
- - Recall: 0.8755
27
- - F1: 0.8725
28
 
29
  ## Model description
30
 
 
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  metrics:
8
  - accuracy
 
19
 
20
  # vit-lr-inverse-sqrt
21
 
22
+ This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the skin-cancer dataset.
23
  It achieves the following results on the evaluation set:
24
+ - Loss: 0.4469
25
+ - Accuracy: 0.8499
26
+ - Precision: 0.8565
27
+ - Recall: 0.8499
28
+ - F1: 0.8516
29
 
30
  ## Model description
31
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.3,
3
+ "eval_accuracy": 0.8498613037447988,
4
+ "eval_f1": 0.8515512179522667,
5
+ "eval_loss": 0.44692692160606384,
6
+ "eval_precision": 0.856522763832034,
7
+ "eval_recall": 0.8498613037447988,
8
+ "eval_runtime": 37.3402,
9
+ "eval_samples_per_second": 77.236,
10
+ "eval_steps_per_second": 9.668,
11
+ "total_flos": 2.1047767559471923e+18,
12
+ "train_loss": 0.21456260421779005,
13
+ "train_runtime": 1227.9557,
14
+ "train_samples_per_second": 417.605,
15
+ "train_steps_per_second": 26.141
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.3,
3
+ "eval_accuracy": 0.8498613037447988,
4
+ "eval_f1": 0.8515512179522667,
5
+ "eval_loss": 0.44692692160606384,
6
+ "eval_precision": 0.856522763832034,
7
+ "eval_recall": 0.8498613037447988,
8
+ "eval_runtime": 37.3402,
9
+ "eval_samples_per_second": 77.236,
10
+ "eval_steps_per_second": 9.668
11
+ }
runs/Mar19_05-57-04_6492c5bf3fae/events.out.tfevents.1710829161.6492c5bf3fae.6515.7 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:722b3ad70bd73a57d10c963d9971a604b3a146f600a192ab87fd18c1ec60a615
3
+ size 560
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.3,
3
+ "total_flos": 2.1047767559471923e+18,
4
+ "train_loss": 0.21456260421779005,
5
+ "train_runtime": 1227.9557,
6
+ "train_samples_per_second": 417.605,
7
+ "train_steps_per_second": 26.141
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1424 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.44692692160606384,
3
+ "best_model_checkpoint": "./vit-lr-inverse-sqrt/checkpoint-700",
4
+ "epoch": 5.29595015576324,
5
+ "eval_steps": 100,
6
+ "global_step": 1700,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.03,
13
+ "grad_norm": 17.940969467163086,
14
+ "learning_rate": 1.125e-05,
15
+ "loss": 2.0172,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.06,
20
+ "grad_norm": 4.308961391448975,
21
+ "learning_rate": 2.375e-05,
22
+ "loss": 1.1159,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.09,
27
+ "grad_norm": 5.38205099105835,
28
+ "learning_rate": 3.625e-05,
29
+ "loss": 1.1398,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.12,
34
+ "grad_norm": 5.569328308105469,
35
+ "learning_rate": 4.875e-05,
36
+ "loss": 1.0508,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.16,
41
+ "grad_norm": 5.870121002197266,
42
+ "learning_rate": 6.125000000000001e-05,
43
+ "loss": 0.8095,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.19,
48
+ "grad_norm": 6.100069046020508,
49
+ "learning_rate": 7.375e-05,
50
+ "loss": 0.8756,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.22,
55
+ "grad_norm": 4.655179023742676,
56
+ "learning_rate": 8.625000000000001e-05,
57
+ "loss": 0.9221,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.25,
62
+ "grad_norm": 4.762995719909668,
63
+ "learning_rate": 9.875000000000002e-05,
64
+ "loss": 0.6852,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.28,
69
+ "grad_norm": 5.286139011383057,
70
+ "learning_rate": 9.480909262799545e-05,
71
+ "loss": 0.6655,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.31,
76
+ "grad_norm": 4.113163471221924,
77
+ "learning_rate": 8.989331499509895e-05,
78
+ "loss": 0.6694,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.31,
83
+ "eval_accuracy": 0.7631761442441054,
84
+ "eval_f1": 0.7499001805196772,
85
+ "eval_loss": 0.6511249542236328,
86
+ "eval_precision": 0.7557663051217941,
87
+ "eval_recall": 0.7631761442441054,
88
+ "eval_runtime": 36.5742,
89
+ "eval_samples_per_second": 78.853,
90
+ "eval_steps_per_second": 9.87,
91
+ "step": 100
92
+ },
93
+ {
94
+ "epoch": 0.34,
95
+ "grad_norm": 3.8087217807769775,
96
+ "learning_rate": 8.567058737562387e-05,
97
+ "loss": 0.6742,
98
+ "step": 110
99
+ },
100
+ {
101
+ "epoch": 0.37,
102
+ "grad_norm": 8.133487701416016,
103
+ "learning_rate": 8.199200616907878e-05,
104
+ "loss": 0.6546,
105
+ "step": 120
106
+ },
107
+ {
108
+ "epoch": 0.4,
109
+ "grad_norm": 7.749859809875488,
110
+ "learning_rate": 7.874992309581578e-05,
111
+ "loss": 0.6175,
112
+ "step": 130
113
+ },
114
+ {
115
+ "epoch": 0.44,
116
+ "grad_norm": 3.3360700607299805,
117
+ "learning_rate": 7.586432418108816e-05,
118
+ "loss": 0.4904,
119
+ "step": 140
120
+ },
121
+ {
122
+ "epoch": 0.47,
123
+ "grad_norm": 4.42198371887207,
124
+ "learning_rate": 7.327433054473117e-05,
125
+ "loss": 0.5958,
126
+ "step": 150
127
+ },
128
+ {
129
+ "epoch": 0.5,
130
+ "grad_norm": 6.188783168792725,
131
+ "learning_rate": 7.093269021319087e-05,
132
+ "loss": 0.5494,
133
+ "step": 160
134
+ },
135
+ {
136
+ "epoch": 0.53,
137
+ "grad_norm": 4.339505672454834,
138
+ "learning_rate": 6.880209161537815e-05,
139
+ "loss": 0.5787,
140
+ "step": 170
141
+ },
142
+ {
143
+ "epoch": 0.56,
144
+ "grad_norm": 7.221861362457275,
145
+ "learning_rate": 6.685262704648756e-05,
146
+ "loss": 0.6344,
147
+ "step": 180
148
+ },
149
+ {
150
+ "epoch": 0.59,
151
+ "grad_norm": 6.194063663482666,
152
+ "learning_rate": 6.506000486323555e-05,
153
+ "loss": 0.5987,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 0.62,
158
+ "grad_norm": 5.874014854431152,
159
+ "learning_rate": 6.340426249482415e-05,
160
+ "loss": 0.5468,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 0.62,
165
+ "eval_accuracy": 0.7617891816920943,
166
+ "eval_f1": 0.7109169676846827,
167
+ "eval_loss": 0.6536840796470642,
168
+ "eval_precision": 0.7713261180604432,
169
+ "eval_recall": 0.7617891816920943,
170
+ "eval_runtime": 36.698,
171
+ "eval_samples_per_second": 78.587,
172
+ "eval_steps_per_second": 9.837,
173
+ "step": 200
174
+ },
175
+ {
176
+ "epoch": 0.65,
177
+ "grad_norm": 4.4315361976623535,
178
+ "learning_rate": 6.18688224889746e-05,
179
+ "loss": 0.6049,
180
+ "step": 210
181
+ },
182
+ {
183
+ "epoch": 0.69,
184
+ "grad_norm": 6.3696699142456055,
185
+ "learning_rate": 6.043978852154994e-05,
186
+ "loss": 0.6266,
187
+ "step": 220
188
+ },
189
+ {
190
+ "epoch": 0.72,
191
+ "grad_norm": 4.233091354370117,
192
+ "learning_rate": 5.910541245655418e-05,
193
+ "loss": 0.3634,
194
+ "step": 230
195
+ },
196
+ {
197
+ "epoch": 0.75,
198
+ "grad_norm": 2.4452743530273438,
199
+ "learning_rate": 5.7855685414037173e-05,
200
+ "loss": 0.5039,
201
+ "step": 240
202
+ },
203
+ {
204
+ "epoch": 0.78,
205
+ "grad_norm": 6.689948558807373,
206
+ "learning_rate": 5.6796183424706484e-05,
207
+ "loss": 0.6249,
208
+ "step": 250
209
+ },
210
+ {
211
+ "epoch": 0.81,
212
+ "grad_norm": 5.241004467010498,
213
+ "learning_rate": 5.568460463897046e-05,
214
+ "loss": 0.4695,
215
+ "step": 260
216
+ },
217
+ {
218
+ "epoch": 0.84,
219
+ "grad_norm": 3.0335886478424072,
220
+ "learning_rate": 5.46358364708153e-05,
221
+ "loss": 0.4405,
222
+ "step": 270
223
+ },
224
+ {
225
+ "epoch": 0.87,
226
+ "grad_norm": 4.185052394866943,
227
+ "learning_rate": 5.364417807858201e-05,
228
+ "loss": 0.4555,
229
+ "step": 280
230
+ },
231
+ {
232
+ "epoch": 0.9,
233
+ "grad_norm": 7.301243305206299,
234
+ "learning_rate": 5.270462766947299e-05,
235
+ "loss": 0.5954,
236
+ "step": 290
237
+ },
238
+ {
239
+ "epoch": 0.93,
240
+ "grad_norm": 5.7551798820495605,
241
+ "learning_rate": 5.181277601508398e-05,
242
+ "loss": 0.6132,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 0.93,
247
+ "eval_accuracy": 0.8144937586685159,
248
+ "eval_f1": 0.813842282595786,
249
+ "eval_loss": 0.5131940841674805,
250
+ "eval_precision": 0.8261935513216945,
251
+ "eval_recall": 0.8144937586685159,
252
+ "eval_runtime": 35.4672,
253
+ "eval_samples_per_second": 81.314,
254
+ "eval_steps_per_second": 10.178,
255
+ "step": 300
256
+ },
257
+ {
258
+ "epoch": 0.97,
259
+ "grad_norm": 3.989379405975342,
260
+ "learning_rate": 5.0964719143762554e-05,
261
+ "loss": 0.4919,
262
+ "step": 310
263
+ },
264
+ {
265
+ "epoch": 1.0,
266
+ "grad_norm": 6.611064910888672,
267
+ "learning_rate": 5.015698625755192e-05,
268
+ "loss": 0.4443,
269
+ "step": 320
270
+ },
271
+ {
272
+ "epoch": 1.03,
273
+ "grad_norm": 3.822683334350586,
274
+ "learning_rate": 4.9386479832479486e-05,
275
+ "loss": 0.4072,
276
+ "step": 330
277
+ },
278
+ {
279
+ "epoch": 1.06,
280
+ "grad_norm": 3.768205165863037,
281
+ "learning_rate": 4.865042554105199e-05,
282
+ "loss": 0.2584,
283
+ "step": 340
284
+ },
285
+ {
286
+ "epoch": 1.09,
287
+ "grad_norm": 1.0489230155944824,
288
+ "learning_rate": 4.794633014853842e-05,
289
+ "loss": 0.3173,
290
+ "step": 350
291
+ },
292
+ {
293
+ "epoch": 1.12,
294
+ "grad_norm": 3.1184098720550537,
295
+ "learning_rate": 4.727194592470655e-05,
296
+ "loss": 0.2359,
297
+ "step": 360
298
+ },
299
+ {
300
+ "epoch": 1.15,
301
+ "grad_norm": 2.9331960678100586,
302
+ "learning_rate": 4.662524041201569e-05,
303
+ "loss": 0.2935,
304
+ "step": 370
305
+ },
306
+ {
307
+ "epoch": 1.18,
308
+ "grad_norm": 10.099534034729004,
309
+ "learning_rate": 4.600437062282362e-05,
310
+ "loss": 0.2998,
311
+ "step": 380
312
+ },
313
+ {
314
+ "epoch": 1.21,
315
+ "grad_norm": 4.351399898529053,
316
+ "learning_rate": 4.540766091864998e-05,
317
+ "loss": 0.4389,
318
+ "step": 390
319
+ },
320
+ {
321
+ "epoch": 1.25,
322
+ "grad_norm": 2.582808256149292,
323
+ "learning_rate": 4.4833583966222034e-05,
324
+ "loss": 0.3319,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 1.25,
329
+ "eval_accuracy": 0.8307905686546463,
330
+ "eval_f1": 0.829309725170082,
331
+ "eval_loss": 0.47056975960731506,
332
+ "eval_precision": 0.8327103233714703,
333
+ "eval_recall": 0.8307905686546463,
334
+ "eval_runtime": 36.3423,
335
+ "eval_samples_per_second": 79.357,
336
+ "eval_steps_per_second": 9.933,
337
+ "step": 400
338
+ },
339
+ {
340
+ "epoch": 1.28,
341
+ "grad_norm": 5.291513919830322,
342
+ "learning_rate": 4.428074427700477e-05,
343
+ "loss": 0.2839,
344
+ "step": 410
345
+ },
346
+ {
347
+ "epoch": 1.31,
348
+ "grad_norm": 5.357462406158447,
349
+ "learning_rate": 4.3747863925980715e-05,
350
+ "loss": 0.3073,
351
+ "step": 420
352
+ },
353
+ {
354
+ "epoch": 1.34,
355
+ "grad_norm": 4.056901454925537,
356
+ "learning_rate": 4.32337701167117e-05,
357
+ "loss": 0.3814,
358
+ "step": 430
359
+ },
360
+ {
361
+ "epoch": 1.37,
362
+ "grad_norm": 4.224523544311523,
363
+ "learning_rate": 4.273738431706883e-05,
364
+ "loss": 0.2582,
365
+ "step": 440
366
+ },
367
+ {
368
+ "epoch": 1.4,
369
+ "grad_norm": 5.398918628692627,
370
+ "learning_rate": 4.225771273642583e-05,
371
+ "loss": 0.3279,
372
+ "step": 450
373
+ },
374
+ {
375
+ "epoch": 1.43,
376
+ "grad_norm": 4.845595359802246,
377
+ "learning_rate": 4.179383795285729e-05,
378
+ "loss": 0.2912,
379
+ "step": 460
380
+ },
381
+ {
382
+ "epoch": 1.46,
383
+ "grad_norm": 3.754013776779175,
384
+ "learning_rate": 4.1344911529736155e-05,
385
+ "loss": 0.224,
386
+ "step": 470
387
+ },
388
+ {
389
+ "epoch": 1.5,
390
+ "grad_norm": 9.241665840148926,
391
+ "learning_rate": 4.0910147486461317e-05,
392
+ "loss": 0.3102,
393
+ "step": 480
394
+ },
395
+ {
396
+ "epoch": 1.53,
397
+ "grad_norm": 6.650195598602295,
398
+ "learning_rate": 4.0488816508945806e-05,
399
+ "loss": 0.3613,
400
+ "step": 490
401
+ },
402
+ {
403
+ "epoch": 1.56,
404
+ "grad_norm": 2.6371705532073975,
405
+ "learning_rate": 4.008024080281012e-05,
406
+ "loss": 0.2286,
407
+ "step": 500
408
+ },
409
+ {
410
+ "epoch": 1.56,
411
+ "eval_accuracy": 0.8352981969486823,
412
+ "eval_f1": 0.8225709280993477,
413
+ "eval_loss": 0.4952014982700348,
414
+ "eval_precision": 0.8446619811727021,
415
+ "eval_recall": 0.8352981969486823,
416
+ "eval_runtime": 37.026,
417
+ "eval_samples_per_second": 77.891,
418
+ "eval_steps_per_second": 9.75,
419
+ "step": 500
420
+ },
421
+ {
422
+ "epoch": 1.59,
423
+ "grad_norm": 2.9743077754974365,
424
+ "learning_rate": 3.9683789506627256e-05,
425
+ "loss": 0.3705,
426
+ "step": 510
427
+ },
428
+ {
429
+ "epoch": 1.62,
430
+ "grad_norm": 3.7261903285980225,
431
+ "learning_rate": 3.929887459459297e-05,
432
+ "loss": 0.305,
433
+ "step": 520
434
+ },
435
+ {
436
+ "epoch": 1.65,
437
+ "grad_norm": 6.446900367736816,
438
+ "learning_rate": 3.892494720807615e-05,
439
+ "loss": 0.278,
440
+ "step": 530
441
+ },
442
+ {
443
+ "epoch": 1.68,
444
+ "grad_norm": 2.1284918785095215,
445
+ "learning_rate": 3.856149436398495e-05,
446
+ "loss": 0.2053,
447
+ "step": 540
448
+ },
449
+ {
450
+ "epoch": 1.71,
451
+ "grad_norm": 4.4590277671813965,
452
+ "learning_rate": 3.8208035995043505e-05,
453
+ "loss": 0.3642,
454
+ "step": 550
455
+ },
456
+ {
457
+ "epoch": 1.74,
458
+ "grad_norm": 4.997034549713135,
459
+ "learning_rate": 3.786412228313765e-05,
460
+ "loss": 0.3187,
461
+ "step": 560
462
+ },
463
+ {
464
+ "epoch": 1.78,
465
+ "grad_norm": 2.7240359783172607,
466
+ "learning_rate": 3.752933125204008e-05,
467
+ "loss": 0.2594,
468
+ "step": 570
469
+ },
470
+ {
471
+ "epoch": 1.81,
472
+ "grad_norm": 4.143340110778809,
473
+ "learning_rate": 3.720326659021623e-05,
474
+ "loss": 0.2151,
475
+ "step": 580
476
+ },
477
+ {
478
+ "epoch": 1.84,
479
+ "grad_norm": 5.146928787231445,
480
+ "learning_rate": 3.688555567816587e-05,
481
+ "loss": 0.3299,
482
+ "step": 590
483
+ },
484
+ {
485
+ "epoch": 1.87,
486
+ "grad_norm": 2.214599132537842,
487
+ "learning_rate": 3.6575847797972757e-05,
488
+ "loss": 0.2299,
489
+ "step": 600
490
+ },
491
+ {
492
+ "epoch": 1.87,
493
+ "eval_accuracy": 0.8366851595006934,
494
+ "eval_f1": 0.8357987474414161,
495
+ "eval_loss": 0.46962958574295044,
496
+ "eval_precision": 0.8516509479787296,
497
+ "eval_recall": 0.8366851595006934,
498
+ "eval_runtime": 35.6912,
499
+ "eval_samples_per_second": 80.804,
500
+ "eval_steps_per_second": 10.115,
501
+ "step": 600
502
+ },
503
+ {
504
+ "epoch": 1.9,
505
+ "grad_norm": 3.345350980758667,
506
+ "learning_rate": 3.627381250550059e-05,
507
+ "loss": 0.273,
508
+ "step": 610
509
+ },
510
+ {
511
+ "epoch": 1.93,
512
+ "grad_norm": 5.383599758148193,
513
+ "learning_rate": 3.597913814805773e-05,
514
+ "loss": 0.2903,
515
+ "step": 620
516
+ },
517
+ {
518
+ "epoch": 1.96,
519
+ "grad_norm": 3.014591932296753,
520
+ "learning_rate": 3.5691530512412484e-05,
521
+ "loss": 0.2079,
522
+ "step": 630
523
+ },
524
+ {
525
+ "epoch": 1.99,
526
+ "grad_norm": 3.041154146194458,
527
+ "learning_rate": 3.541071158982556e-05,
528
+ "loss": 0.2614,
529
+ "step": 640
530
+ },
531
+ {
532
+ "epoch": 2.02,
533
+ "grad_norm": 3.956765651702881,
534
+ "learning_rate": 3.513641844631533e-05,
535
+ "loss": 0.1976,
536
+ "step": 650
537
+ },
538
+ {
539
+ "epoch": 2.06,
540
+ "grad_norm": 4.5174736976623535,
541
+ "learning_rate": 3.4868402187720335e-05,
542
+ "loss": 0.1183,
543
+ "step": 660
544
+ },
545
+ {
546
+ "epoch": 2.09,
547
+ "grad_norm": 1.5246050357818604,
548
+ "learning_rate": 3.460642701029914e-05,
549
+ "loss": 0.1483,
550
+ "step": 670
551
+ },
552
+ {
553
+ "epoch": 2.12,
554
+ "grad_norm": 4.209073066711426,
555
+ "learning_rate": 3.435026932863631e-05,
556
+ "loss": 0.1136,
557
+ "step": 680
558
+ },
559
+ {
560
+ "epoch": 2.15,
561
+ "grad_norm": 3.5777525901794434,
562
+ "learning_rate": 3.4099716973523676e-05,
563
+ "loss": 0.0993,
564
+ "step": 690
565
+ },
566
+ {
567
+ "epoch": 2.18,
568
+ "grad_norm": 4.207378387451172,
569
+ "learning_rate": 3.385456845327663e-05,
570
+ "loss": 0.0542,
571
+ "step": 700
572
+ },
573
+ {
574
+ "epoch": 2.18,
575
+ "eval_accuracy": 0.8498613037447988,
576
+ "eval_f1": 0.8515512179522667,
577
+ "eval_loss": 0.44692692160606384,
578
+ "eval_precision": 0.856522763832034,
579
+ "eval_recall": 0.8498613037447988,
580
+ "eval_runtime": 35.9166,
581
+ "eval_samples_per_second": 80.297,
582
+ "eval_steps_per_second": 10.051,
583
+ "step": 700
584
+ },
585
+ {
586
+ "epoch": 2.21,
587
+ "grad_norm": 2.78676438331604,
588
+ "learning_rate": 3.361463227264072e-05,
589
+ "loss": 0.1083,
590
+ "step": 710
591
+ },
592
+ {
593
+ "epoch": 2.24,
594
+ "grad_norm": 3.2999024391174316,
595
+ "learning_rate": 3.337972630405625e-05,
596
+ "loss": 0.1755,
597
+ "step": 720
598
+ },
599
+ {
600
+ "epoch": 2.27,
601
+ "grad_norm": 4.462038040161133,
602
+ "learning_rate": 3.3149677206589793e-05,
603
+ "loss": 0.1548,
604
+ "step": 730
605
+ },
606
+ {
607
+ "epoch": 2.31,
608
+ "grad_norm": 2.0917258262634277,
609
+ "learning_rate": 3.2924319888319655e-05,
610
+ "loss": 0.0877,
611
+ "step": 740
612
+ },
613
+ {
614
+ "epoch": 2.34,
615
+ "grad_norm": 4.596163272857666,
616
+ "learning_rate": 3.2703497008386434e-05,
617
+ "loss": 0.1115,
618
+ "step": 750
619
+ },
620
+ {
621
+ "epoch": 2.37,
622
+ "grad_norm": 0.25166478753089905,
623
+ "learning_rate": 3.24870585152958e-05,
624
+ "loss": 0.1126,
625
+ "step": 760
626
+ },
627
+ {
628
+ "epoch": 2.4,
629
+ "grad_norm": 1.1833806037902832,
630
+ "learning_rate": 3.2274861218395145e-05,
631
+ "loss": 0.2144,
632
+ "step": 770
633
+ },
634
+ {
635
+ "epoch": 2.43,
636
+ "grad_norm": 3.9837536811828613,
637
+ "learning_rate": 3.206676838974329e-05,
638
+ "loss": 0.1465,
639
+ "step": 780
640
+ },
641
+ {
642
+ "epoch": 2.46,
643
+ "grad_norm": 0.4924279451370239,
644
+ "learning_rate": 3.1862649393858316e-05,
645
+ "loss": 0.1795,
646
+ "step": 790
647
+ },
648
+ {
649
+ "epoch": 2.49,
650
+ "grad_norm": 7.527681350708008,
651
+ "learning_rate": 3.166237934306518e-05,
652
+ "loss": 0.198,
653
+ "step": 800
654
+ },
655
+ {
656
+ "epoch": 2.49,
657
+ "eval_accuracy": 0.8224687933425797,
658
+ "eval_f1": 0.8322706898840134,
659
+ "eval_loss": 0.5284662246704102,
660
+ "eval_precision": 0.8615898706530778,
661
+ "eval_recall": 0.8224687933425797,
662
+ "eval_runtime": 36.8194,
663
+ "eval_samples_per_second": 78.328,
664
+ "eval_steps_per_second": 9.805,
665
+ "step": 800
666
+ },
667
+ {
668
+ "epoch": 2.52,
669
+ "grad_norm": 2.07871150970459,
670
+ "learning_rate": 3.146583877637763e-05,
671
+ "loss": 0.1241,
672
+ "step": 810
673
+ },
674
+ {
675
+ "epoch": 2.55,
676
+ "grad_norm": 8.270013809204102,
677
+ "learning_rate": 3.127291336003811e-05,
678
+ "loss": 0.1626,
679
+ "step": 820
680
+ },
681
+ {
682
+ "epoch": 2.59,
683
+ "grad_norm": 4.030840873718262,
684
+ "learning_rate": 3.1083493608010464e-05,
685
+ "loss": 0.0771,
686
+ "step": 830
687
+ },
688
+ {
689
+ "epoch": 2.62,
690
+ "grad_norm": 3.374802589416504,
691
+ "learning_rate": 3.0897474620873045e-05,
692
+ "loss": 0.0761,
693
+ "step": 840
694
+ },
695
+ {
696
+ "epoch": 2.65,
697
+ "grad_norm": 0.3307746946811676,
698
+ "learning_rate": 3.0714755841697564e-05,
699
+ "loss": 0.0477,
700
+ "step": 850
701
+ },
702
+ {
703
+ "epoch": 2.68,
704
+ "grad_norm": 5.573653221130371,
705
+ "learning_rate": 3.0535240827622965e-05,
706
+ "loss": 0.0799,
707
+ "step": 860
708
+ },
709
+ {
710
+ "epoch": 2.71,
711
+ "grad_norm": 1.2958701848983765,
712
+ "learning_rate": 3.035883703594582e-05,
713
+ "loss": 0.0984,
714
+ "step": 870
715
+ },
716
+ {
717
+ "epoch": 2.74,
718
+ "grad_norm": 3.165799379348755,
719
+ "learning_rate": 3.0185455623649106e-05,
720
+ "loss": 0.1073,
721
+ "step": 880
722
+ },
723
+ {
724
+ "epoch": 2.77,
725
+ "grad_norm": 0.7759466767311096,
726
+ "learning_rate": 3.0015011259383213e-05,
727
+ "loss": 0.0734,
728
+ "step": 890
729
+ },
730
+ {
731
+ "epoch": 2.8,
732
+ "grad_norm": 1.0856852531433105,
733
+ "learning_rate": 2.9847421946995018e-05,
734
+ "loss": 0.0311,
735
+ "step": 900
736
+ },
737
+ {
738
+ "epoch": 2.8,
739
+ "eval_accuracy": 0.8651178918169209,
740
+ "eval_f1": 0.8661898070442219,
741
+ "eval_loss": 0.4723583161830902,
742
+ "eval_precision": 0.8686654723138755,
743
+ "eval_recall": 0.8651178918169209,
744
+ "eval_runtime": 36.0484,
745
+ "eval_samples_per_second": 80.004,
746
+ "eval_steps_per_second": 10.014,
747
+ "step": 900
748
+ },
749
+ {
750
+ "epoch": 2.83,
751
+ "grad_norm": 0.283659428358078,
752
+ "learning_rate": 2.968260885977624e-05,
753
+ "loss": 0.0772,
754
+ "step": 910
755
+ },
756
+ {
757
+ "epoch": 2.87,
758
+ "grad_norm": 6.567378520965576,
759
+ "learning_rate": 2.9520496184669844e-05,
760
+ "loss": 0.1693,
761
+ "step": 920
762
+ },
763
+ {
764
+ "epoch": 2.9,
765
+ "grad_norm": 0.2684701681137085,
766
+ "learning_rate": 2.9361010975735175e-05,
767
+ "loss": 0.2045,
768
+ "step": 930
769
+ },
770
+ {
771
+ "epoch": 2.93,
772
+ "grad_norm": 3.3332574367523193,
773
+ "learning_rate": 2.9204083016228457e-05,
774
+ "loss": 0.0739,
775
+ "step": 940
776
+ },
777
+ {
778
+ "epoch": 2.96,
779
+ "grad_norm": 1.2437993288040161,
780
+ "learning_rate": 2.904964468870634e-05,
781
+ "loss": 0.0674,
782
+ "step": 950
783
+ },
784
+ {
785
+ "epoch": 2.99,
786
+ "grad_norm": 12.299838066101074,
787
+ "learning_rate": 2.8897630852606727e-05,
788
+ "loss": 0.1677,
789
+ "step": 960
790
+ },
791
+ {
792
+ "epoch": 3.02,
793
+ "grad_norm": 5.6871442794799805,
794
+ "learning_rate": 2.8747978728803455e-05,
795
+ "loss": 0.0329,
796
+ "step": 970
797
+ },
798
+ {
799
+ "epoch": 3.05,
800
+ "grad_norm": 0.1499292105436325,
801
+ "learning_rate": 2.8600627790670087e-05,
802
+ "loss": 0.019,
803
+ "step": 980
804
+ },
805
+ {
806
+ "epoch": 3.08,
807
+ "grad_norm": 0.24846801161766052,
808
+ "learning_rate": 2.8455519661223613e-05,
809
+ "loss": 0.0075,
810
+ "step": 990
811
+ },
812
+ {
813
+ "epoch": 3.12,
814
+ "grad_norm": 0.13459719717502594,
815
+ "learning_rate": 2.8312598015950882e-05,
816
+ "loss": 0.0543,
817
+ "step": 1000
818
+ },
819
+ {
820
+ "epoch": 3.12,
821
+ "eval_accuracy": 0.866504854368932,
822
+ "eval_f1": 0.8610792630184371,
823
+ "eval_loss": 0.4949225187301636,
824
+ "eval_precision": 0.8612246115926523,
825
+ "eval_recall": 0.866504854368932,
826
+ "eval_runtime": 36.1317,
827
+ "eval_samples_per_second": 79.819,
828
+ "eval_steps_per_second": 9.991,
829
+ "step": 1000
830
+ },
831
+ {
832
+ "epoch": 3.15,
833
+ "grad_norm": 5.778144836425781,
834
+ "learning_rate": 2.817180849095055e-05,
835
+ "loss": 0.0448,
836
+ "step": 1010
837
+ },
838
+ {
839
+ "epoch": 3.18,
840
+ "grad_norm": 0.5647971630096436,
841
+ "learning_rate": 2.803309859605025e-05,
842
+ "loss": 0.0279,
843
+ "step": 1020
844
+ },
845
+ {
846
+ "epoch": 3.21,
847
+ "grad_norm": 0.3868822157382965,
848
+ "learning_rate": 2.7896417632583534e-05,
849
+ "loss": 0.0175,
850
+ "step": 1030
851
+ },
852
+ {
853
+ "epoch": 3.24,
854
+ "grad_norm": 0.2270067036151886,
855
+ "learning_rate": 2.77617166155343e-05,
856
+ "loss": 0.0073,
857
+ "step": 1040
858
+ },
859
+ {
860
+ "epoch": 3.27,
861
+ "grad_norm": 0.43895870447158813,
862
+ "learning_rate": 2.762894819977688e-05,
863
+ "loss": 0.0243,
864
+ "step": 1050
865
+ },
866
+ {
867
+ "epoch": 3.3,
868
+ "grad_norm": 3.83394455909729,
869
+ "learning_rate": 2.749806661015982e-05,
870
+ "loss": 0.0217,
871
+ "step": 1060
872
+ },
873
+ {
874
+ "epoch": 3.33,
875
+ "grad_norm": 0.19903437793254852,
876
+ "learning_rate": 2.736902757519867e-05,
877
+ "loss": 0.0235,
878
+ "step": 1070
879
+ },
880
+ {
881
+ "epoch": 3.36,
882
+ "grad_norm": 0.2069879025220871,
883
+ "learning_rate": 2.724178826415978e-05,
884
+ "loss": 0.0398,
885
+ "step": 1080
886
+ },
887
+ {
888
+ "epoch": 3.4,
889
+ "grad_norm": 9.80591106414795,
890
+ "learning_rate": 2.711630722733202e-05,
891
+ "loss": 0.0649,
892
+ "step": 1090
893
+ },
894
+ {
895
+ "epoch": 3.43,
896
+ "grad_norm": 0.3856530487537384,
897
+ "learning_rate": 2.69925443392972e-05,
898
+ "loss": 0.0242,
899
+ "step": 1100
900
+ },
901
+ {
902
+ "epoch": 3.43,
903
+ "eval_accuracy": 0.8623439667128987,
904
+ "eval_f1": 0.8510243678619469,
905
+ "eval_loss": 0.6283301115036011,
906
+ "eval_precision": 0.8661193391176473,
907
+ "eval_recall": 0.8623439667128987,
908
+ "eval_runtime": 36.1111,
909
+ "eval_samples_per_second": 79.865,
910
+ "eval_steps_per_second": 9.997,
911
+ "step": 1100
912
+ },
913
+ {
914
+ "epoch": 3.46,
915
+ "grad_norm": 0.18870727717876434,
916
+ "learning_rate": 2.687046074502295e-05,
917
+ "loss": 0.0271,
918
+ "step": 1110
919
+ },
920
+ {
921
+ "epoch": 3.49,
922
+ "grad_norm": 1.5470973253250122,
923
+ "learning_rate": 2.675001880861359e-05,
924
+ "loss": 0.0398,
925
+ "step": 1120
926
+ },
927
+ {
928
+ "epoch": 3.52,
929
+ "grad_norm": 0.26312679052352905,
930
+ "learning_rate": 2.6631182064565375e-05,
931
+ "loss": 0.0141,
932
+ "step": 1130
933
+ },
934
+ {
935
+ "epoch": 3.55,
936
+ "grad_norm": 0.15462630987167358,
937
+ "learning_rate": 2.6513915171382936e-05,
938
+ "loss": 0.0761,
939
+ "step": 1140
940
+ },
941
+ {
942
+ "epoch": 3.58,
943
+ "grad_norm": 0.10996732115745544,
944
+ "learning_rate": 2.6398183867422732e-05,
945
+ "loss": 0.0048,
946
+ "step": 1150
947
+ },
948
+ {
949
+ "epoch": 3.61,
950
+ "grad_norm": 0.38109445571899414,
951
+ "learning_rate": 2.6283954928838412e-05,
952
+ "loss": 0.0243,
953
+ "step": 1160
954
+ },
955
+ {
956
+ "epoch": 3.64,
957
+ "grad_norm": 0.5097649097442627,
958
+ "learning_rate": 2.6171196129510684e-05,
959
+ "loss": 0.0362,
960
+ "step": 1170
961
+ },
962
+ {
963
+ "epoch": 3.68,
964
+ "grad_norm": 0.05090225115418434,
965
+ "learning_rate": 2.605987620285215e-05,
966
+ "loss": 0.0261,
967
+ "step": 1180
968
+ },
969
+ {
970
+ "epoch": 3.71,
971
+ "grad_norm": 0.6788052320480347,
972
+ "learning_rate": 2.5949964805384102e-05,
973
+ "loss": 0.0045,
974
+ "step": 1190
975
+ },
976
+ {
977
+ "epoch": 3.74,
978
+ "grad_norm": 0.026266297325491905,
979
+ "learning_rate": 2.5841432481989113e-05,
980
+ "loss": 0.0179,
981
+ "step": 1200
982
+ },
983
+ {
984
+ "epoch": 3.74,
985
+ "eval_accuracy": 0.8723994452149791,
986
+ "eval_f1": 0.8674518149970984,
987
+ "eval_loss": 0.5766238570213318,
988
+ "eval_precision": 0.8681300166641136,
989
+ "eval_recall": 0.8723994452149791,
990
+ "eval_runtime": 36.8086,
991
+ "eval_samples_per_second": 78.351,
992
+ "eval_steps_per_second": 9.807,
993
+ "step": 1200
994
+ },
995
+ {
996
+ "epoch": 3.77,
997
+ "grad_norm": 4.131288528442383,
998
+ "learning_rate": 2.573425063274894e-05,
999
+ "loss": 0.0311,
1000
+ "step": 1210
1001
+ },
1002
+ {
1003
+ "epoch": 3.8,
1004
+ "grad_norm": 0.09610464423894882,
1005
+ "learning_rate": 2.5628391481282988e-05,
1006
+ "loss": 0.0247,
1007
+ "step": 1220
1008
+ },
1009
+ {
1010
+ "epoch": 3.83,
1011
+ "grad_norm": 0.39180612564086914,
1012
+ "learning_rate": 2.5523828044507798e-05,
1013
+ "loss": 0.0185,
1014
+ "step": 1230
1015
+ },
1016
+ {
1017
+ "epoch": 3.86,
1018
+ "grad_norm": 0.017242716625332832,
1019
+ "learning_rate": 2.5420534103742737e-05,
1020
+ "loss": 0.0258,
1021
+ "step": 1240
1022
+ },
1023
+ {
1024
+ "epoch": 3.89,
1025
+ "grad_norm": 4.467873573303223,
1026
+ "learning_rate": 2.5318484177091666e-05,
1027
+ "loss": 0.0137,
1028
+ "step": 1250
1029
+ },
1030
+ {
1031
+ "epoch": 3.93,
1032
+ "grad_norm": 0.21439874172210693,
1033
+ "learning_rate": 2.5217653493034472e-05,
1034
+ "loss": 0.032,
1035
+ "step": 1260
1036
+ },
1037
+ {
1038
+ "epoch": 3.96,
1039
+ "grad_norm": 0.06672481447458267,
1040
+ "learning_rate": 2.511801796516642e-05,
1041
+ "loss": 0.0061,
1042
+ "step": 1270
1043
+ },
1044
+ {
1045
+ "epoch": 3.99,
1046
+ "grad_norm": 0.050495993345975876,
1047
+ "learning_rate": 2.501955416802672e-05,
1048
+ "loss": 0.0194,
1049
+ "step": 1280
1050
+ },
1051
+ {
1052
+ "epoch": 4.02,
1053
+ "grad_norm": 0.430034875869751,
1054
+ "learning_rate": 2.492223931396134e-05,
1055
+ "loss": 0.0138,
1056
+ "step": 1290
1057
+ },
1058
+ {
1059
+ "epoch": 4.05,
1060
+ "grad_norm": 0.020715204998850822,
1061
+ "learning_rate": 2.482605123096805e-05,
1062
+ "loss": 0.01,
1063
+ "step": 1300
1064
+ },
1065
+ {
1066
+ "epoch": 4.05,
1067
+ "eval_accuracy": 0.8595700416088765,
1068
+ "eval_f1": 0.8534690367784884,
1069
+ "eval_loss": 0.6232466697692871,
1070
+ "eval_precision": 0.8523064665048566,
1071
+ "eval_recall": 0.8595700416088765,
1072
+ "eval_runtime": 36.0575,
1073
+ "eval_samples_per_second": 79.983,
1074
+ "eval_steps_per_second": 10.012,
1075
+ "step": 1300
1076
+ },
1077
+ {
1078
+ "epoch": 4.08,
1079
+ "grad_norm": 0.04595523327589035,
1080
+ "learning_rate": 2.47309683414749e-05,
1081
+ "loss": 0.0065,
1082
+ "step": 1310
1083
+ },
1084
+ {
1085
+ "epoch": 4.11,
1086
+ "grad_norm": 0.03214849531650543,
1087
+ "learning_rate": 2.4636969642005952e-05,
1088
+ "loss": 0.0019,
1089
+ "step": 1320
1090
+ },
1091
+ {
1092
+ "epoch": 4.14,
1093
+ "grad_norm": 0.10033855587244034,
1094
+ "learning_rate": 2.45440346836908e-05,
1095
+ "loss": 0.0039,
1096
+ "step": 1330
1097
+ },
1098
+ {
1099
+ "epoch": 4.17,
1100
+ "grad_norm": 2.8943097591400146,
1101
+ "learning_rate": 2.4452143553576716e-05,
1102
+ "loss": 0.0023,
1103
+ "step": 1340
1104
+ },
1105
+ {
1106
+ "epoch": 4.21,
1107
+ "grad_norm": 0.9686470627784729,
1108
+ "learning_rate": 2.4361276856704794e-05,
1109
+ "loss": 0.0037,
1110
+ "step": 1350
1111
+ },
1112
+ {
1113
+ "epoch": 4.24,
1114
+ "grad_norm": 0.03016245923936367,
1115
+ "learning_rate": 2.4271415698913302e-05,
1116
+ "loss": 0.0018,
1117
+ "step": 1360
1118
+ },
1119
+ {
1120
+ "epoch": 4.27,
1121
+ "grad_norm": 0.10285181552171707,
1122
+ "learning_rate": 2.4182541670333722e-05,
1123
+ "loss": 0.0017,
1124
+ "step": 1370
1125
+ },
1126
+ {
1127
+ "epoch": 4.3,
1128
+ "grad_norm": 0.1811099499464035,
1129
+ "learning_rate": 2.4094636829546745e-05,
1130
+ "loss": 0.0011,
1131
+ "step": 1380
1132
+ },
1133
+ {
1134
+ "epoch": 4.33,
1135
+ "grad_norm": 0.03486869856715202,
1136
+ "learning_rate": 2.4007683688367184e-05,
1137
+ "loss": 0.0013,
1138
+ "step": 1390
1139
+ },
1140
+ {
1141
+ "epoch": 4.36,
1142
+ "grad_norm": 0.019455306231975555,
1143
+ "learning_rate": 2.3921665197228592e-05,
1144
+ "loss": 0.0018,
1145
+ "step": 1400
1146
+ },
1147
+ {
1148
+ "epoch": 4.36,
1149
+ "eval_accuracy": 0.874133148404993,
1150
+ "eval_f1": 0.8710107373815618,
1151
+ "eval_loss": 0.6012547612190247,
1152
+ "eval_precision": 0.8707403535526532,
1153
+ "eval_recall": 0.874133148404993,
1154
+ "eval_runtime": 35.9997,
1155
+ "eval_samples_per_second": 80.112,
1156
+ "eval_steps_per_second": 10.028,
1157
+ "step": 1400
1158
+ },
1159
+ {
1160
+ "epoch": 4.39,
1161
+ "grad_norm": 3.699671745300293,
1162
+ "learning_rate": 2.383656473113981e-05,
1163
+ "loss": 0.0054,
1164
+ "step": 1410
1165
+ },
1166
+ {
1167
+ "epoch": 4.42,
1168
+ "grad_norm": 0.07088273018598557,
1169
+ "learning_rate": 2.3752366076187175e-05,
1170
+ "loss": 0.0012,
1171
+ "step": 1420
1172
+ },
1173
+ {
1174
+ "epoch": 4.45,
1175
+ "grad_norm": 0.006407163105905056,
1176
+ "learning_rate": 2.3669053416557544e-05,
1177
+ "loss": 0.0058,
1178
+ "step": 1430
1179
+ },
1180
+ {
1181
+ "epoch": 4.49,
1182
+ "grad_norm": 0.24147652089595795,
1183
+ "learning_rate": 2.35866113220585e-05,
1184
+ "loss": 0.0288,
1185
+ "step": 1440
1186
+ },
1187
+ {
1188
+ "epoch": 4.52,
1189
+ "grad_norm": 0.0284121036529541,
1190
+ "learning_rate": 2.3505024736113422e-05,
1191
+ "loss": 0.0012,
1192
+ "step": 1450
1193
+ },
1194
+ {
1195
+ "epoch": 4.55,
1196
+ "grad_norm": 0.10920742154121399,
1197
+ "learning_rate": 2.3424278964210216e-05,
1198
+ "loss": 0.0171,
1199
+ "step": 1460
1200
+ },
1201
+ {
1202
+ "epoch": 4.58,
1203
+ "grad_norm": 0.02022623084485531,
1204
+ "learning_rate": 2.334435966278354e-05,
1205
+ "loss": 0.0013,
1206
+ "step": 1470
1207
+ },
1208
+ {
1209
+ "epoch": 4.61,
1210
+ "grad_norm": 1.6955662965774536,
1211
+ "learning_rate": 2.3265252828511455e-05,
1212
+ "loss": 0.0043,
1213
+ "step": 1480
1214
+ },
1215
+ {
1216
+ "epoch": 4.64,
1217
+ "grad_norm": 0.2646436095237732,
1218
+ "learning_rate": 2.3186944788008412e-05,
1219
+ "loss": 0.007,
1220
+ "step": 1490
1221
+ },
1222
+ {
1223
+ "epoch": 4.67,
1224
+ "grad_norm": 0.02134719491004944,
1225
+ "learning_rate": 2.3109422187897257e-05,
1226
+ "loss": 0.0019,
1227
+ "step": 1500
1228
+ },
1229
+ {
1230
+ "epoch": 4.67,
1231
+ "eval_accuracy": 0.8682385575589459,
1232
+ "eval_f1": 0.8643329503142037,
1233
+ "eval_loss": 0.6553735136985779,
1234
+ "eval_precision": 0.8688611568946607,
1235
+ "eval_recall": 0.8682385575589459,
1236
+ "eval_runtime": 36.5058,
1237
+ "eval_samples_per_second": 79.001,
1238
+ "eval_steps_per_second": 9.889,
1239
+ "step": 1500
1240
+ },
1241
+ {
1242
+ "epoch": 4.7,
1243
+ "grad_norm": 0.10854582488536835,
1244
+ "learning_rate": 2.3032671985243938e-05,
1245
+ "loss": 0.0035,
1246
+ "step": 1510
1247
+ },
1248
+ {
1249
+ "epoch": 4.74,
1250
+ "grad_norm": 0.023413635790348053,
1251
+ "learning_rate": 2.2956681438339396e-05,
1252
+ "loss": 0.0014,
1253
+ "step": 1520
1254
+ },
1255
+ {
1256
+ "epoch": 4.77,
1257
+ "grad_norm": 0.03499768301844597,
1258
+ "learning_rate": 2.2881438097813777e-05,
1259
+ "loss": 0.001,
1260
+ "step": 1530
1261
+ },
1262
+ {
1263
+ "epoch": 4.8,
1264
+ "grad_norm": 0.28913193941116333,
1265
+ "learning_rate": 2.2806929798068923e-05,
1266
+ "loss": 0.0016,
1267
+ "step": 1540
1268
+ },
1269
+ {
1270
+ "epoch": 4.83,
1271
+ "grad_norm": 0.02359519526362419,
1272
+ "learning_rate": 2.273314464901578e-05,
1273
+ "loss": 0.017,
1274
+ "step": 1550
1275
+ },
1276
+ {
1277
+ "epoch": 4.86,
1278
+ "grad_norm": 0.20512829720973969,
1279
+ "learning_rate": 2.2660071028103958e-05,
1280
+ "loss": 0.0011,
1281
+ "step": 1560
1282
+ },
1283
+ {
1284
+ "epoch": 4.89,
1285
+ "grad_norm": 0.017939003184437752,
1286
+ "learning_rate": 2.2587697572631283e-05,
1287
+ "loss": 0.0114,
1288
+ "step": 1570
1289
+ },
1290
+ {
1291
+ "epoch": 4.92,
1292
+ "grad_norm": 0.05149148404598236,
1293
+ "learning_rate": 2.2516013172321875e-05,
1294
+ "loss": 0.0008,
1295
+ "step": 1580
1296
+ },
1297
+ {
1298
+ "epoch": 4.95,
1299
+ "grad_norm": 0.03117392770946026,
1300
+ "learning_rate": 2.2445006962161678e-05,
1301
+ "loss": 0.001,
1302
+ "step": 1590
1303
+ },
1304
+ {
1305
+ "epoch": 4.98,
1306
+ "grad_norm": 0.03808211162686348,
1307
+ "learning_rate": 2.2374668315480894e-05,
1308
+ "loss": 0.0024,
1309
+ "step": 1600
1310
+ },
1311
+ {
1312
+ "epoch": 4.98,
1313
+ "eval_accuracy": 0.8713592233009708,
1314
+ "eval_f1": 0.8718548085138199,
1315
+ "eval_loss": 0.6107261776924133,
1316
+ "eval_precision": 0.8729661517370155,
1317
+ "eval_recall": 0.8713592233009708,
1318
+ "eval_runtime": 35.9356,
1319
+ "eval_samples_per_second": 80.255,
1320
+ "eval_steps_per_second": 10.046,
1321
+ "step": 1600
1322
+ },
1323
+ {
1324
+ "epoch": 5.02,
1325
+ "grad_norm": 0.016484640538692474,
1326
+ "learning_rate": 2.2304986837273524e-05,
1327
+ "loss": 0.004,
1328
+ "step": 1610
1329
+ },
1330
+ {
1331
+ "epoch": 5.05,
1332
+ "grad_norm": 0.026364829391241074,
1333
+ "learning_rate": 2.2235952357744237e-05,
1334
+ "loss": 0.001,
1335
+ "step": 1620
1336
+ },
1337
+ {
1338
+ "epoch": 5.08,
1339
+ "grad_norm": 0.007881557568907738,
1340
+ "learning_rate": 2.2167554926073632e-05,
1341
+ "loss": 0.0005,
1342
+ "step": 1630
1343
+ },
1344
+ {
1345
+ "epoch": 5.11,
1346
+ "grad_norm": 0.02089390717446804,
1347
+ "learning_rate": 2.2099784804393198e-05,
1348
+ "loss": 0.0005,
1349
+ "step": 1640
1350
+ },
1351
+ {
1352
+ "epoch": 5.14,
1353
+ "grad_norm": 0.01271623745560646,
1354
+ "learning_rate": 2.2032632461961585e-05,
1355
+ "loss": 0.0008,
1356
+ "step": 1650
1357
+ },
1358
+ {
1359
+ "epoch": 5.17,
1360
+ "grad_norm": 0.006357505917549133,
1361
+ "learning_rate": 2.196608856953445e-05,
1362
+ "loss": 0.0005,
1363
+ "step": 1660
1364
+ },
1365
+ {
1366
+ "epoch": 5.2,
1367
+ "grad_norm": 0.0172483678907156,
1368
+ "learning_rate": 2.1900143993920144e-05,
1369
+ "loss": 0.0005,
1370
+ "step": 1670
1371
+ },
1372
+ {
1373
+ "epoch": 5.23,
1374
+ "grad_norm": 0.009964230470359325,
1375
+ "learning_rate": 2.1834789792714154e-05,
1376
+ "loss": 0.0005,
1377
+ "step": 1680
1378
+ },
1379
+ {
1380
+ "epoch": 5.26,
1381
+ "grad_norm": 0.017695054411888123,
1382
+ "learning_rate": 2.1770017209205408e-05,
1383
+ "loss": 0.0015,
1384
+ "step": 1690
1385
+ },
1386
+ {
1387
+ "epoch": 5.3,
1388
+ "grad_norm": 0.058045148849487305,
1389
+ "learning_rate": 2.170581766744771e-05,
1390
+ "loss": 0.0006,
1391
+ "step": 1700
1392
+ },
1393
+ {
1394
+ "epoch": 5.3,
1395
+ "eval_accuracy": 0.8755201109570042,
1396
+ "eval_f1": 0.8725140598764357,
1397
+ "eval_loss": 0.6352503299713135,
1398
+ "eval_precision": 0.8751361058141811,
1399
+ "eval_recall": 0.8755201109570042,
1400
+ "eval_runtime": 35.8249,
1401
+ "eval_samples_per_second": 80.503,
1402
+ "eval_steps_per_second": 10.077,
1403
+ "step": 1700
1404
+ },
1405
+ {
1406
+ "epoch": 5.3,
1407
+ "step": 1700,
1408
+ "total_flos": 2.1047767559471923e+18,
1409
+ "train_loss": 0.21456260421779005,
1410
+ "train_runtime": 1227.9557,
1411
+ "train_samples_per_second": 417.605,
1412
+ "train_steps_per_second": 26.141
1413
+ }
1414
+ ],
1415
+ "logging_steps": 10,
1416
+ "max_steps": 32100,
1417
+ "num_input_tokens_seen": 0,
1418
+ "num_train_epochs": 100,
1419
+ "save_steps": 100,
1420
+ "total_flos": 2.1047767559471923e+18,
1421
+ "train_batch_size": 16,
1422
+ "trial_name": null,
1423
+ "trial_params": null
1424
+ }