sharren commited on
Commit
727f6e7
1 Parent(s): ab1727e

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224
4
  tags:
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
@@ -18,13 +19,13 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  # vit-lr-linear
20
 
21
- This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 0.8046
24
- - Accuracy: 0.8499
25
- - Precision: 0.8547
26
- - Recall: 0.8499
27
- - F1: 0.8506
28
 
29
  ## Model description
30
 
 
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  metrics:
8
  - accuracy
 
19
 
20
  # vit-lr-linear
21
 
22
+ This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the skin-cancer dataset.
23
  It achieves the following results on the evaluation set:
24
+ - Loss: 0.4920
25
+ - Accuracy: 0.8322
26
+ - Precision: 0.8400
27
+ - Recall: 0.8322
28
+ - F1: 0.8323
29
 
30
  ## Model description
31
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.61,
3
+ "eval_accuracy": 0.8321775312066574,
4
+ "eval_f1": 0.8323189306799796,
5
+ "eval_loss": 0.4919604957103729,
6
+ "eval_precision": 0.8400335349095535,
7
+ "eval_recall": 0.8321775312066574,
8
+ "eval_runtime": 40.0976,
9
+ "eval_samples_per_second": 71.924,
10
+ "eval_steps_per_second": 9.003,
11
+ "total_flos": 2.2287694956200755e+18,
12
+ "train_loss": 0.26933162180913817,
13
+ "train_runtime": 1430.4649,
14
+ "train_samples_per_second": 358.485,
15
+ "train_steps_per_second": 22.44
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.61,
3
+ "eval_accuracy": 0.8321775312066574,
4
+ "eval_f1": 0.8323189306799796,
5
+ "eval_loss": 0.4919604957103729,
6
+ "eval_precision": 0.8400335349095535,
7
+ "eval_recall": 0.8321775312066574,
8
+ "eval_runtime": 40.0976,
9
+ "eval_samples_per_second": 71.924,
10
+ "eval_steps_per_second": 9.003
11
+ }
runs/Mar18_14-12-44_9c311a5b3773/events.out.tfevents.1710772666.9c311a5b3773.3314.19 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b345a873dec8ec1bff10a76d535ae7bbe9b4c5de74b7ecb01088b46f2eb1cf30
3
+ size 560
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.61,
3
+ "total_flos": 2.2287694956200755e+18,
4
+ "train_loss": 0.26933162180913817,
5
+ "train_runtime": 1430.4649,
6
+ "train_samples_per_second": 358.485,
7
+ "train_steps_per_second": 22.44
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1506 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.4919604957103729,
3
+ "best_model_checkpoint": "./vit-lr-linear/checkpoint-800",
4
+ "epoch": 5.607476635514018,
5
+ "eval_steps": 100,
6
+ "global_step": 1800,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.03,
13
+ "grad_norm": 3.5967516899108887,
14
+ "learning_rate": 9.997196261682243e-05,
15
+ "loss": 1.2508,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.06,
20
+ "grad_norm": 4.702871799468994,
21
+ "learning_rate": 9.994080996884736e-05,
22
+ "loss": 0.7499,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.09,
27
+ "grad_norm": 6.171701431274414,
28
+ "learning_rate": 9.990965732087227e-05,
29
+ "loss": 0.8834,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.12,
34
+ "grad_norm": 10.749275207519531,
35
+ "learning_rate": 9.98785046728972e-05,
36
+ "loss": 1.0502,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.16,
41
+ "grad_norm": 4.334569931030273,
42
+ "learning_rate": 9.984735202492212e-05,
43
+ "loss": 0.769,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.19,
48
+ "grad_norm": 4.809230327606201,
49
+ "learning_rate": 9.981619937694705e-05,
50
+ "loss": 0.9034,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.22,
55
+ "grad_norm": 3.983206272125244,
56
+ "learning_rate": 9.978504672897196e-05,
57
+ "loss": 0.8274,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.25,
62
+ "grad_norm": 3.574300765991211,
63
+ "learning_rate": 9.975389408099689e-05,
64
+ "loss": 0.6348,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.28,
69
+ "grad_norm": 5.728092670440674,
70
+ "learning_rate": 9.972274143302182e-05,
71
+ "loss": 0.6568,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.31,
76
+ "grad_norm": 4.649799823760986,
77
+ "learning_rate": 9.969158878504672e-05,
78
+ "loss": 0.6029,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.31,
83
+ "eval_accuracy": 0.7805131761442441,
84
+ "eval_f1": 0.7528727438506023,
85
+ "eval_loss": 0.6126354932785034,
86
+ "eval_precision": 0.7602026831090402,
87
+ "eval_recall": 0.7805131761442441,
88
+ "eval_runtime": 39.2505,
89
+ "eval_samples_per_second": 73.477,
90
+ "eval_steps_per_second": 9.197,
91
+ "step": 100
92
+ },
93
+ {
94
+ "epoch": 0.34,
95
+ "grad_norm": 5.011153221130371,
96
+ "learning_rate": 9.966043613707165e-05,
97
+ "loss": 0.6847,
98
+ "step": 110
99
+ },
100
+ {
101
+ "epoch": 0.37,
102
+ "grad_norm": 7.656059741973877,
103
+ "learning_rate": 9.962928348909658e-05,
104
+ "loss": 0.6162,
105
+ "step": 120
106
+ },
107
+ {
108
+ "epoch": 0.4,
109
+ "grad_norm": 7.763223648071289,
110
+ "learning_rate": 9.95981308411215e-05,
111
+ "loss": 0.6389,
112
+ "step": 130
113
+ },
114
+ {
115
+ "epoch": 0.44,
116
+ "grad_norm": 2.2059271335601807,
117
+ "learning_rate": 9.956697819314643e-05,
118
+ "loss": 0.5381,
119
+ "step": 140
120
+ },
121
+ {
122
+ "epoch": 0.47,
123
+ "grad_norm": 6.06673002243042,
124
+ "learning_rate": 9.953582554517134e-05,
125
+ "loss": 0.5674,
126
+ "step": 150
127
+ },
128
+ {
129
+ "epoch": 0.5,
130
+ "grad_norm": 5.407893657684326,
131
+ "learning_rate": 9.950467289719627e-05,
132
+ "loss": 0.5598,
133
+ "step": 160
134
+ },
135
+ {
136
+ "epoch": 0.53,
137
+ "grad_norm": 7.515843391418457,
138
+ "learning_rate": 9.947352024922119e-05,
139
+ "loss": 0.5488,
140
+ "step": 170
141
+ },
142
+ {
143
+ "epoch": 0.56,
144
+ "grad_norm": 7.197587966918945,
145
+ "learning_rate": 9.944236760124612e-05,
146
+ "loss": 0.7078,
147
+ "step": 180
148
+ },
149
+ {
150
+ "epoch": 0.59,
151
+ "grad_norm": 4.026732921600342,
152
+ "learning_rate": 9.941121495327103e-05,
153
+ "loss": 0.595,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 0.62,
158
+ "grad_norm": 5.807314872741699,
159
+ "learning_rate": 9.938006230529595e-05,
160
+ "loss": 0.5726,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 0.62,
165
+ "eval_accuracy": 0.7649098474341193,
166
+ "eval_f1": 0.7177347277722194,
167
+ "eval_loss": 0.6950196623802185,
168
+ "eval_precision": 0.7613231169607695,
169
+ "eval_recall": 0.7649098474341193,
170
+ "eval_runtime": 39.3588,
171
+ "eval_samples_per_second": 73.275,
172
+ "eval_steps_per_second": 9.172,
173
+ "step": 200
174
+ },
175
+ {
176
+ "epoch": 0.65,
177
+ "grad_norm": 5.1864399909973145,
178
+ "learning_rate": 9.934890965732088e-05,
179
+ "loss": 0.6681,
180
+ "step": 210
181
+ },
182
+ {
183
+ "epoch": 0.69,
184
+ "grad_norm": 7.364803314208984,
185
+ "learning_rate": 9.93177570093458e-05,
186
+ "loss": 0.7502,
187
+ "step": 220
188
+ },
189
+ {
190
+ "epoch": 0.72,
191
+ "grad_norm": 1.9555000066757202,
192
+ "learning_rate": 9.928660436137072e-05,
193
+ "loss": 0.3976,
194
+ "step": 230
195
+ },
196
+ {
197
+ "epoch": 0.75,
198
+ "grad_norm": 2.5017478466033936,
199
+ "learning_rate": 9.925545171339564e-05,
200
+ "loss": 0.5505,
201
+ "step": 240
202
+ },
203
+ {
204
+ "epoch": 0.78,
205
+ "grad_norm": 4.633613109588623,
206
+ "learning_rate": 9.922429906542056e-05,
207
+ "loss": 0.7452,
208
+ "step": 250
209
+ },
210
+ {
211
+ "epoch": 0.81,
212
+ "grad_norm": 4.068040370941162,
213
+ "learning_rate": 9.91931464174455e-05,
214
+ "loss": 0.5837,
215
+ "step": 260
216
+ },
217
+ {
218
+ "epoch": 0.84,
219
+ "grad_norm": 3.191166639328003,
220
+ "learning_rate": 9.916199376947041e-05,
221
+ "loss": 0.4488,
222
+ "step": 270
223
+ },
224
+ {
225
+ "epoch": 0.87,
226
+ "grad_norm": 2.8749473094940186,
227
+ "learning_rate": 9.913084112149534e-05,
228
+ "loss": 0.4906,
229
+ "step": 280
230
+ },
231
+ {
232
+ "epoch": 0.9,
233
+ "grad_norm": 7.547589302062988,
234
+ "learning_rate": 9.909968847352025e-05,
235
+ "loss": 0.6191,
236
+ "step": 290
237
+ },
238
+ {
239
+ "epoch": 0.93,
240
+ "grad_norm": 5.1279096603393555,
241
+ "learning_rate": 9.906853582554517e-05,
242
+ "loss": 0.6521,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 0.93,
247
+ "eval_accuracy": 0.8124133148404993,
248
+ "eval_f1": 0.8060484287452215,
249
+ "eval_loss": 0.5102406740188599,
250
+ "eval_precision": 0.8148843275891411,
251
+ "eval_recall": 0.8124133148404993,
252
+ "eval_runtime": 39.7509,
253
+ "eval_samples_per_second": 72.552,
254
+ "eval_steps_per_second": 9.082,
255
+ "step": 300
256
+ },
257
+ {
258
+ "epoch": 0.97,
259
+ "grad_norm": 3.052353858947754,
260
+ "learning_rate": 9.90373831775701e-05,
261
+ "loss": 0.5616,
262
+ "step": 310
263
+ },
264
+ {
265
+ "epoch": 1.0,
266
+ "grad_norm": 4.507020473480225,
267
+ "learning_rate": 9.900623052959503e-05,
268
+ "loss": 0.5192,
269
+ "step": 320
270
+ },
271
+ {
272
+ "epoch": 1.03,
273
+ "grad_norm": 5.176488876342773,
274
+ "learning_rate": 9.897507788161994e-05,
275
+ "loss": 0.4461,
276
+ "step": 330
277
+ },
278
+ {
279
+ "epoch": 1.06,
280
+ "grad_norm": 4.797460079193115,
281
+ "learning_rate": 9.894392523364486e-05,
282
+ "loss": 0.3651,
283
+ "step": 340
284
+ },
285
+ {
286
+ "epoch": 1.09,
287
+ "grad_norm": 2.186629295349121,
288
+ "learning_rate": 9.891277258566979e-05,
289
+ "loss": 0.4004,
290
+ "step": 350
291
+ },
292
+ {
293
+ "epoch": 1.12,
294
+ "grad_norm": 2.1874918937683105,
295
+ "learning_rate": 9.888161993769472e-05,
296
+ "loss": 0.3475,
297
+ "step": 360
298
+ },
299
+ {
300
+ "epoch": 1.15,
301
+ "grad_norm": 4.8143486976623535,
302
+ "learning_rate": 9.885046728971963e-05,
303
+ "loss": 0.4073,
304
+ "step": 370
305
+ },
306
+ {
307
+ "epoch": 1.18,
308
+ "grad_norm": 7.921704292297363,
309
+ "learning_rate": 9.881931464174455e-05,
310
+ "loss": 0.3765,
311
+ "step": 380
312
+ },
313
+ {
314
+ "epoch": 1.21,
315
+ "grad_norm": 5.491533279418945,
316
+ "learning_rate": 9.878816199376948e-05,
317
+ "loss": 0.4707,
318
+ "step": 390
319
+ },
320
+ {
321
+ "epoch": 1.25,
322
+ "grad_norm": 2.563472032546997,
323
+ "learning_rate": 9.875700934579439e-05,
324
+ "loss": 0.3803,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 1.25,
329
+ "eval_accuracy": 0.7843273231622746,
330
+ "eval_f1": 0.7933752302092828,
331
+ "eval_loss": 0.6124634146690369,
332
+ "eval_precision": 0.8128379357321871,
333
+ "eval_recall": 0.7843273231622746,
334
+ "eval_runtime": 39.1164,
335
+ "eval_samples_per_second": 73.729,
336
+ "eval_steps_per_second": 9.229,
337
+ "step": 400
338
+ },
339
+ {
340
+ "epoch": 1.28,
341
+ "grad_norm": 3.978813648223877,
342
+ "learning_rate": 9.872585669781932e-05,
343
+ "loss": 0.4527,
344
+ "step": 410
345
+ },
346
+ {
347
+ "epoch": 1.31,
348
+ "grad_norm": 3.3713996410369873,
349
+ "learning_rate": 9.869470404984425e-05,
350
+ "loss": 0.4344,
351
+ "step": 420
352
+ },
353
+ {
354
+ "epoch": 1.34,
355
+ "grad_norm": 3.685149669647217,
356
+ "learning_rate": 9.866355140186917e-05,
357
+ "loss": 0.5163,
358
+ "step": 430
359
+ },
360
+ {
361
+ "epoch": 1.37,
362
+ "grad_norm": 3.9468307495117188,
363
+ "learning_rate": 9.863239875389408e-05,
364
+ "loss": 0.4187,
365
+ "step": 440
366
+ },
367
+ {
368
+ "epoch": 1.4,
369
+ "grad_norm": 4.608047962188721,
370
+ "learning_rate": 9.860124610591901e-05,
371
+ "loss": 0.3643,
372
+ "step": 450
373
+ },
374
+ {
375
+ "epoch": 1.43,
376
+ "grad_norm": 3.3221333026885986,
377
+ "learning_rate": 9.857009345794394e-05,
378
+ "loss": 0.3324,
379
+ "step": 460
380
+ },
381
+ {
382
+ "epoch": 1.46,
383
+ "grad_norm": 3.251314640045166,
384
+ "learning_rate": 9.853894080996885e-05,
385
+ "loss": 0.3229,
386
+ "step": 470
387
+ },
388
+ {
389
+ "epoch": 1.5,
390
+ "grad_norm": 7.8897552490234375,
391
+ "learning_rate": 9.850778816199377e-05,
392
+ "loss": 0.3684,
393
+ "step": 480
394
+ },
395
+ {
396
+ "epoch": 1.53,
397
+ "grad_norm": 5.32474422454834,
398
+ "learning_rate": 9.84766355140187e-05,
399
+ "loss": 0.3567,
400
+ "step": 490
401
+ },
402
+ {
403
+ "epoch": 1.56,
404
+ "grad_norm": 2.955794334411621,
405
+ "learning_rate": 9.844548286604361e-05,
406
+ "loss": 0.4048,
407
+ "step": 500
408
+ },
409
+ {
410
+ "epoch": 1.56,
411
+ "eval_accuracy": 0.8214285714285714,
412
+ "eval_f1": 0.80775004871548,
413
+ "eval_loss": 0.5058895349502563,
414
+ "eval_precision": 0.8156080811716645,
415
+ "eval_recall": 0.8214285714285714,
416
+ "eval_runtime": 39.0979,
417
+ "eval_samples_per_second": 73.764,
418
+ "eval_steps_per_second": 9.233,
419
+ "step": 500
420
+ },
421
+ {
422
+ "epoch": 1.59,
423
+ "grad_norm": 3.655855894088745,
424
+ "learning_rate": 9.841433021806854e-05,
425
+ "loss": 0.4464,
426
+ "step": 510
427
+ },
428
+ {
429
+ "epoch": 1.62,
430
+ "grad_norm": 3.6523778438568115,
431
+ "learning_rate": 9.838317757009346e-05,
432
+ "loss": 0.4154,
433
+ "step": 520
434
+ },
435
+ {
436
+ "epoch": 1.65,
437
+ "grad_norm": 6.649527072906494,
438
+ "learning_rate": 9.835202492211837e-05,
439
+ "loss": 0.4337,
440
+ "step": 530
441
+ },
442
+ {
443
+ "epoch": 1.68,
444
+ "grad_norm": 4.308875560760498,
445
+ "learning_rate": 9.83208722741433e-05,
446
+ "loss": 0.3551,
447
+ "step": 540
448
+ },
449
+ {
450
+ "epoch": 1.71,
451
+ "grad_norm": 5.290976047515869,
452
+ "learning_rate": 9.828971962616823e-05,
453
+ "loss": 0.4329,
454
+ "step": 550
455
+ },
456
+ {
457
+ "epoch": 1.74,
458
+ "grad_norm": 3.688100814819336,
459
+ "learning_rate": 9.825856697819316e-05,
460
+ "loss": 0.4306,
461
+ "step": 560
462
+ },
463
+ {
464
+ "epoch": 1.78,
465
+ "grad_norm": 2.4794669151306152,
466
+ "learning_rate": 9.822741433021808e-05,
467
+ "loss": 0.3281,
468
+ "step": 570
469
+ },
470
+ {
471
+ "epoch": 1.81,
472
+ "grad_norm": 3.5106523036956787,
473
+ "learning_rate": 9.819626168224299e-05,
474
+ "loss": 0.3184,
475
+ "step": 580
476
+ },
477
+ {
478
+ "epoch": 1.84,
479
+ "grad_norm": 5.135136604309082,
480
+ "learning_rate": 9.816510903426792e-05,
481
+ "loss": 0.3644,
482
+ "step": 590
483
+ },
484
+ {
485
+ "epoch": 1.87,
486
+ "grad_norm": 3.2593114376068115,
487
+ "learning_rate": 9.813395638629284e-05,
488
+ "loss": 0.2939,
489
+ "step": 600
490
+ },
491
+ {
492
+ "epoch": 1.87,
493
+ "eval_accuracy": 0.7680305131761442,
494
+ "eval_f1": 0.7817959651329371,
495
+ "eval_loss": 0.6723023653030396,
496
+ "eval_precision": 0.8366418813297796,
497
+ "eval_recall": 0.7680305131761442,
498
+ "eval_runtime": 39.3089,
499
+ "eval_samples_per_second": 73.368,
500
+ "eval_steps_per_second": 9.184,
501
+ "step": 600
502
+ },
503
+ {
504
+ "epoch": 1.9,
505
+ "grad_norm": 7.719142913818359,
506
+ "learning_rate": 9.810280373831777e-05,
507
+ "loss": 0.5629,
508
+ "step": 610
509
+ },
510
+ {
511
+ "epoch": 1.93,
512
+ "grad_norm": 4.130067348480225,
513
+ "learning_rate": 9.807165109034268e-05,
514
+ "loss": 0.4416,
515
+ "step": 620
516
+ },
517
+ {
518
+ "epoch": 1.96,
519
+ "grad_norm": 1.9364572763442993,
520
+ "learning_rate": 9.80404984423676e-05,
521
+ "loss": 0.2339,
522
+ "step": 630
523
+ },
524
+ {
525
+ "epoch": 1.99,
526
+ "grad_norm": 2.4033007621765137,
527
+ "learning_rate": 9.800934579439253e-05,
528
+ "loss": 0.3284,
529
+ "step": 640
530
+ },
531
+ {
532
+ "epoch": 2.02,
533
+ "grad_norm": 3.25533390045166,
534
+ "learning_rate": 9.797819314641746e-05,
535
+ "loss": 0.2307,
536
+ "step": 650
537
+ },
538
+ {
539
+ "epoch": 2.06,
540
+ "grad_norm": 3.1277670860290527,
541
+ "learning_rate": 9.794704049844237e-05,
542
+ "loss": 0.2063,
543
+ "step": 660
544
+ },
545
+ {
546
+ "epoch": 2.09,
547
+ "grad_norm": 2.923804521560669,
548
+ "learning_rate": 9.791588785046729e-05,
549
+ "loss": 0.1563,
550
+ "step": 670
551
+ },
552
+ {
553
+ "epoch": 2.12,
554
+ "grad_norm": 4.745158672332764,
555
+ "learning_rate": 9.788473520249222e-05,
556
+ "loss": 0.2451,
557
+ "step": 680
558
+ },
559
+ {
560
+ "epoch": 2.15,
561
+ "grad_norm": 5.691440582275391,
562
+ "learning_rate": 9.785358255451714e-05,
563
+ "loss": 0.2315,
564
+ "step": 690
565
+ },
566
+ {
567
+ "epoch": 2.18,
568
+ "grad_norm": 7.0971550941467285,
569
+ "learning_rate": 9.782242990654206e-05,
570
+ "loss": 0.2138,
571
+ "step": 700
572
+ },
573
+ {
574
+ "epoch": 2.18,
575
+ "eval_accuracy": 0.812760055478502,
576
+ "eval_f1": 0.8169834700772869,
577
+ "eval_loss": 0.635110080242157,
578
+ "eval_precision": 0.8480203726240728,
579
+ "eval_recall": 0.812760055478502,
580
+ "eval_runtime": 39.4295,
581
+ "eval_samples_per_second": 73.143,
582
+ "eval_steps_per_second": 9.156,
583
+ "step": 700
584
+ },
585
+ {
586
+ "epoch": 2.21,
587
+ "grad_norm": 2.342768430709839,
588
+ "learning_rate": 9.779127725856699e-05,
589
+ "loss": 0.2763,
590
+ "step": 710
591
+ },
592
+ {
593
+ "epoch": 2.24,
594
+ "grad_norm": 4.194737434387207,
595
+ "learning_rate": 9.77601246105919e-05,
596
+ "loss": 0.3159,
597
+ "step": 720
598
+ },
599
+ {
600
+ "epoch": 2.27,
601
+ "grad_norm": 4.488862991333008,
602
+ "learning_rate": 9.772897196261682e-05,
603
+ "loss": 0.2594,
604
+ "step": 730
605
+ },
606
+ {
607
+ "epoch": 2.31,
608
+ "grad_norm": 3.612229347229004,
609
+ "learning_rate": 9.769781931464175e-05,
610
+ "loss": 0.1651,
611
+ "step": 740
612
+ },
613
+ {
614
+ "epoch": 2.34,
615
+ "grad_norm": 4.490918159484863,
616
+ "learning_rate": 9.766666666666668e-05,
617
+ "loss": 0.1714,
618
+ "step": 750
619
+ },
620
+ {
621
+ "epoch": 2.37,
622
+ "grad_norm": 1.2064989805221558,
623
+ "learning_rate": 9.76355140186916e-05,
624
+ "loss": 0.1423,
625
+ "step": 760
626
+ },
627
+ {
628
+ "epoch": 2.4,
629
+ "grad_norm": 3.8275320529937744,
630
+ "learning_rate": 9.760436137071651e-05,
631
+ "loss": 0.2647,
632
+ "step": 770
633
+ },
634
+ {
635
+ "epoch": 2.43,
636
+ "grad_norm": 6.258784294128418,
637
+ "learning_rate": 9.757320872274144e-05,
638
+ "loss": 0.1925,
639
+ "step": 780
640
+ },
641
+ {
642
+ "epoch": 2.46,
643
+ "grad_norm": 6.503872394561768,
644
+ "learning_rate": 9.754205607476637e-05,
645
+ "loss": 0.3565,
646
+ "step": 790
647
+ },
648
+ {
649
+ "epoch": 2.49,
650
+ "grad_norm": 4.768139362335205,
651
+ "learning_rate": 9.751090342679128e-05,
652
+ "loss": 0.2615,
653
+ "step": 800
654
+ },
655
+ {
656
+ "epoch": 2.49,
657
+ "eval_accuracy": 0.8321775312066574,
658
+ "eval_f1": 0.8323189306799796,
659
+ "eval_loss": 0.4919604957103729,
660
+ "eval_precision": 0.8400335349095535,
661
+ "eval_recall": 0.8321775312066574,
662
+ "eval_runtime": 39.8278,
663
+ "eval_samples_per_second": 72.412,
664
+ "eval_steps_per_second": 9.064,
665
+ "step": 800
666
+ },
667
+ {
668
+ "epoch": 2.52,
669
+ "grad_norm": 3.3493497371673584,
670
+ "learning_rate": 9.74797507788162e-05,
671
+ "loss": 0.179,
672
+ "step": 810
673
+ },
674
+ {
675
+ "epoch": 2.55,
676
+ "grad_norm": 6.209288597106934,
677
+ "learning_rate": 9.744859813084113e-05,
678
+ "loss": 0.1872,
679
+ "step": 820
680
+ },
681
+ {
682
+ "epoch": 2.59,
683
+ "grad_norm": 3.6066982746124268,
684
+ "learning_rate": 9.741744548286604e-05,
685
+ "loss": 0.24,
686
+ "step": 830
687
+ },
688
+ {
689
+ "epoch": 2.62,
690
+ "grad_norm": 2.330963611602783,
691
+ "learning_rate": 9.738629283489097e-05,
692
+ "loss": 0.177,
693
+ "step": 840
694
+ },
695
+ {
696
+ "epoch": 2.65,
697
+ "grad_norm": 0.31879034638404846,
698
+ "learning_rate": 9.73551401869159e-05,
699
+ "loss": 0.1846,
700
+ "step": 850
701
+ },
702
+ {
703
+ "epoch": 2.68,
704
+ "grad_norm": 6.56818151473999,
705
+ "learning_rate": 9.732398753894082e-05,
706
+ "loss": 0.2731,
707
+ "step": 860
708
+ },
709
+ {
710
+ "epoch": 2.71,
711
+ "grad_norm": 1.255943775177002,
712
+ "learning_rate": 9.729283489096573e-05,
713
+ "loss": 0.3217,
714
+ "step": 870
715
+ },
716
+ {
717
+ "epoch": 2.74,
718
+ "grad_norm": 5.009977340698242,
719
+ "learning_rate": 9.726168224299066e-05,
720
+ "loss": 0.2576,
721
+ "step": 880
722
+ },
723
+ {
724
+ "epoch": 2.77,
725
+ "grad_norm": 1.8944584131240845,
726
+ "learning_rate": 9.723052959501559e-05,
727
+ "loss": 0.2223,
728
+ "step": 890
729
+ },
730
+ {
731
+ "epoch": 2.8,
732
+ "grad_norm": 3.543699026107788,
733
+ "learning_rate": 9.71993769470405e-05,
734
+ "loss": 0.2125,
735
+ "step": 900
736
+ },
737
+ {
738
+ "epoch": 2.8,
739
+ "eval_accuracy": 0.8491678224687933,
740
+ "eval_f1": 0.843224216265628,
741
+ "eval_loss": 0.5596445202827454,
742
+ "eval_precision": 0.8508841434558159,
743
+ "eval_recall": 0.8491678224687933,
744
+ "eval_runtime": 39.2828,
745
+ "eval_samples_per_second": 73.416,
746
+ "eval_steps_per_second": 9.19,
747
+ "step": 900
748
+ },
749
+ {
750
+ "epoch": 2.83,
751
+ "grad_norm": 5.571394920349121,
752
+ "learning_rate": 9.716822429906542e-05,
753
+ "loss": 0.2826,
754
+ "step": 910
755
+ },
756
+ {
757
+ "epoch": 2.87,
758
+ "grad_norm": 5.057092666625977,
759
+ "learning_rate": 9.713707165109035e-05,
760
+ "loss": 0.2647,
761
+ "step": 920
762
+ },
763
+ {
764
+ "epoch": 2.9,
765
+ "grad_norm": 3.190361738204956,
766
+ "learning_rate": 9.710591900311527e-05,
767
+ "loss": 0.4015,
768
+ "step": 930
769
+ },
770
+ {
771
+ "epoch": 2.93,
772
+ "grad_norm": 6.205695629119873,
773
+ "learning_rate": 9.70747663551402e-05,
774
+ "loss": 0.312,
775
+ "step": 940
776
+ },
777
+ {
778
+ "epoch": 2.96,
779
+ "grad_norm": 4.805609226226807,
780
+ "learning_rate": 9.704361370716511e-05,
781
+ "loss": 0.2098,
782
+ "step": 950
783
+ },
784
+ {
785
+ "epoch": 2.99,
786
+ "grad_norm": 3.2721455097198486,
787
+ "learning_rate": 9.701246105919004e-05,
788
+ "loss": 0.1959,
789
+ "step": 960
790
+ },
791
+ {
792
+ "epoch": 3.02,
793
+ "grad_norm": 4.190710067749023,
794
+ "learning_rate": 9.698130841121495e-05,
795
+ "loss": 0.1048,
796
+ "step": 970
797
+ },
798
+ {
799
+ "epoch": 3.05,
800
+ "grad_norm": 0.5176565051078796,
801
+ "learning_rate": 9.695015576323988e-05,
802
+ "loss": 0.0816,
803
+ "step": 980
804
+ },
805
+ {
806
+ "epoch": 3.08,
807
+ "grad_norm": 0.1512940376996994,
808
+ "learning_rate": 9.691900311526481e-05,
809
+ "loss": 0.0327,
810
+ "step": 990
811
+ },
812
+ {
813
+ "epoch": 3.12,
814
+ "grad_norm": 5.473361015319824,
815
+ "learning_rate": 9.688785046728971e-05,
816
+ "loss": 0.0768,
817
+ "step": 1000
818
+ },
819
+ {
820
+ "epoch": 3.12,
821
+ "eval_accuracy": 0.8290568654646324,
822
+ "eval_f1": 0.8235345080050543,
823
+ "eval_loss": 0.823904275894165,
824
+ "eval_precision": 0.8499722083747269,
825
+ "eval_recall": 0.8290568654646324,
826
+ "eval_runtime": 39.6039,
827
+ "eval_samples_per_second": 72.821,
828
+ "eval_steps_per_second": 9.115,
829
+ "step": 1000
830
+ },
831
+ {
832
+ "epoch": 3.15,
833
+ "grad_norm": 2.461423635482788,
834
+ "learning_rate": 9.685669781931464e-05,
835
+ "loss": 0.1258,
836
+ "step": 1010
837
+ },
838
+ {
839
+ "epoch": 3.18,
840
+ "grad_norm": 5.284664154052734,
841
+ "learning_rate": 9.682554517133957e-05,
842
+ "loss": 0.1124,
843
+ "step": 1020
844
+ },
845
+ {
846
+ "epoch": 3.21,
847
+ "grad_norm": 1.3465226888656616,
848
+ "learning_rate": 9.679439252336449e-05,
849
+ "loss": 0.0986,
850
+ "step": 1030
851
+ },
852
+ {
853
+ "epoch": 3.24,
854
+ "grad_norm": 3.916722297668457,
855
+ "learning_rate": 9.676323987538942e-05,
856
+ "loss": 0.0541,
857
+ "step": 1040
858
+ },
859
+ {
860
+ "epoch": 3.27,
861
+ "grad_norm": 0.6409209966659546,
862
+ "learning_rate": 9.673208722741433e-05,
863
+ "loss": 0.1176,
864
+ "step": 1050
865
+ },
866
+ {
867
+ "epoch": 3.3,
868
+ "grad_norm": 4.325451374053955,
869
+ "learning_rate": 9.670093457943926e-05,
870
+ "loss": 0.2169,
871
+ "step": 1060
872
+ },
873
+ {
874
+ "epoch": 3.33,
875
+ "grad_norm": 0.5127350687980652,
876
+ "learning_rate": 9.666978193146418e-05,
877
+ "loss": 0.0739,
878
+ "step": 1070
879
+ },
880
+ {
881
+ "epoch": 3.36,
882
+ "grad_norm": 7.527690410614014,
883
+ "learning_rate": 9.66386292834891e-05,
884
+ "loss": 0.184,
885
+ "step": 1080
886
+ },
887
+ {
888
+ "epoch": 3.4,
889
+ "grad_norm": 2.938415765762329,
890
+ "learning_rate": 9.660747663551402e-05,
891
+ "loss": 0.1526,
892
+ "step": 1090
893
+ },
894
+ {
895
+ "epoch": 3.43,
896
+ "grad_norm": 0.9604325890541077,
897
+ "learning_rate": 9.657632398753894e-05,
898
+ "loss": 0.0649,
899
+ "step": 1100
900
+ },
901
+ {
902
+ "epoch": 3.43,
903
+ "eval_accuracy": 0.8366851595006934,
904
+ "eval_f1": 0.8359690431594896,
905
+ "eval_loss": 0.6827093958854675,
906
+ "eval_precision": 0.8480904372623412,
907
+ "eval_recall": 0.8366851595006934,
908
+ "eval_runtime": 39.8316,
909
+ "eval_samples_per_second": 72.405,
910
+ "eval_steps_per_second": 9.063,
911
+ "step": 1100
912
+ },
913
+ {
914
+ "epoch": 3.46,
915
+ "grad_norm": 0.6656555533409119,
916
+ "learning_rate": 9.654517133956387e-05,
917
+ "loss": 0.0866,
918
+ "step": 1110
919
+ },
920
+ {
921
+ "epoch": 3.49,
922
+ "grad_norm": 1.8327091932296753,
923
+ "learning_rate": 9.65140186915888e-05,
924
+ "loss": 0.1117,
925
+ "step": 1120
926
+ },
927
+ {
928
+ "epoch": 3.52,
929
+ "grad_norm": 6.639819145202637,
930
+ "learning_rate": 9.648286604361371e-05,
931
+ "loss": 0.0607,
932
+ "step": 1130
933
+ },
934
+ {
935
+ "epoch": 3.55,
936
+ "grad_norm": 9.595551490783691,
937
+ "learning_rate": 9.645171339563863e-05,
938
+ "loss": 0.1603,
939
+ "step": 1140
940
+ },
941
+ {
942
+ "epoch": 3.58,
943
+ "grad_norm": 4.852327346801758,
944
+ "learning_rate": 9.642056074766356e-05,
945
+ "loss": 0.1289,
946
+ "step": 1150
947
+ },
948
+ {
949
+ "epoch": 3.61,
950
+ "grad_norm": 0.8141772150993347,
951
+ "learning_rate": 9.638940809968848e-05,
952
+ "loss": 0.2513,
953
+ "step": 1160
954
+ },
955
+ {
956
+ "epoch": 3.64,
957
+ "grad_norm": 2.2528672218322754,
958
+ "learning_rate": 9.63582554517134e-05,
959
+ "loss": 0.1033,
960
+ "step": 1170
961
+ },
962
+ {
963
+ "epoch": 3.68,
964
+ "grad_norm": 4.099298000335693,
965
+ "learning_rate": 9.632710280373833e-05,
966
+ "loss": 0.2042,
967
+ "step": 1180
968
+ },
969
+ {
970
+ "epoch": 3.71,
971
+ "grad_norm": 2.307119131088257,
972
+ "learning_rate": 9.629595015576324e-05,
973
+ "loss": 0.0714,
974
+ "step": 1190
975
+ },
976
+ {
977
+ "epoch": 3.74,
978
+ "grad_norm": 6.8174543380737305,
979
+ "learning_rate": 9.626479750778816e-05,
980
+ "loss": 0.1382,
981
+ "step": 1200
982
+ },
983
+ {
984
+ "epoch": 3.74,
985
+ "eval_accuracy": 0.84500693481276,
986
+ "eval_f1": 0.8398698444204201,
987
+ "eval_loss": 0.6838334798812866,
988
+ "eval_precision": 0.8466724060209081,
989
+ "eval_recall": 0.84500693481276,
990
+ "eval_runtime": 39.4271,
991
+ "eval_samples_per_second": 73.148,
992
+ "eval_steps_per_second": 9.156,
993
+ "step": 1200
994
+ },
995
+ {
996
+ "epoch": 3.77,
997
+ "grad_norm": 3.4418959617614746,
998
+ "learning_rate": 9.623364485981309e-05,
999
+ "loss": 0.0921,
1000
+ "step": 1210
1001
+ },
1002
+ {
1003
+ "epoch": 3.8,
1004
+ "grad_norm": 3.6201603412628174,
1005
+ "learning_rate": 9.620249221183802e-05,
1006
+ "loss": 0.15,
1007
+ "step": 1220
1008
+ },
1009
+ {
1010
+ "epoch": 3.83,
1011
+ "grad_norm": 6.8857550621032715,
1012
+ "learning_rate": 9.617133956386293e-05,
1013
+ "loss": 0.0965,
1014
+ "step": 1230
1015
+ },
1016
+ {
1017
+ "epoch": 3.86,
1018
+ "grad_norm": 3.10553240776062,
1019
+ "learning_rate": 9.614018691588785e-05,
1020
+ "loss": 0.1674,
1021
+ "step": 1240
1022
+ },
1023
+ {
1024
+ "epoch": 3.89,
1025
+ "grad_norm": 9.131609916687012,
1026
+ "learning_rate": 9.610903426791278e-05,
1027
+ "loss": 0.1596,
1028
+ "step": 1250
1029
+ },
1030
+ {
1031
+ "epoch": 3.93,
1032
+ "grad_norm": 0.35134002566337585,
1033
+ "learning_rate": 9.607788161993771e-05,
1034
+ "loss": 0.1157,
1035
+ "step": 1260
1036
+ },
1037
+ {
1038
+ "epoch": 3.96,
1039
+ "grad_norm": 5.575935363769531,
1040
+ "learning_rate": 9.604672897196262e-05,
1041
+ "loss": 0.0822,
1042
+ "step": 1270
1043
+ },
1044
+ {
1045
+ "epoch": 3.99,
1046
+ "grad_norm": 0.6137746572494507,
1047
+ "learning_rate": 9.601557632398754e-05,
1048
+ "loss": 0.1316,
1049
+ "step": 1280
1050
+ },
1051
+ {
1052
+ "epoch": 4.02,
1053
+ "grad_norm": 0.34164953231811523,
1054
+ "learning_rate": 9.598442367601247e-05,
1055
+ "loss": 0.0739,
1056
+ "step": 1290
1057
+ },
1058
+ {
1059
+ "epoch": 4.05,
1060
+ "grad_norm": 4.076730728149414,
1061
+ "learning_rate": 9.595327102803738e-05,
1062
+ "loss": 0.0486,
1063
+ "step": 1300
1064
+ },
1065
+ {
1066
+ "epoch": 4.05,
1067
+ "eval_accuracy": 0.8578363384188626,
1068
+ "eval_f1": 0.8494463868744596,
1069
+ "eval_loss": 0.6367300748825073,
1070
+ "eval_precision": 0.8548000651229425,
1071
+ "eval_recall": 0.8578363384188626,
1072
+ "eval_runtime": 39.1728,
1073
+ "eval_samples_per_second": 73.623,
1074
+ "eval_steps_per_second": 9.216,
1075
+ "step": 1300
1076
+ },
1077
+ {
1078
+ "epoch": 4.08,
1079
+ "grad_norm": 0.4267037808895111,
1080
+ "learning_rate": 9.592211838006231e-05,
1081
+ "loss": 0.0499,
1082
+ "step": 1310
1083
+ },
1084
+ {
1085
+ "epoch": 4.11,
1086
+ "grad_norm": 8.932145118713379,
1087
+ "learning_rate": 9.589096573208724e-05,
1088
+ "loss": 0.058,
1089
+ "step": 1320
1090
+ },
1091
+ {
1092
+ "epoch": 4.14,
1093
+ "grad_norm": 7.81501579284668,
1094
+ "learning_rate": 9.585981308411214e-05,
1095
+ "loss": 0.0497,
1096
+ "step": 1330
1097
+ },
1098
+ {
1099
+ "epoch": 4.17,
1100
+ "grad_norm": 3.25376296043396,
1101
+ "learning_rate": 9.582866043613707e-05,
1102
+ "loss": 0.0613,
1103
+ "step": 1340
1104
+ },
1105
+ {
1106
+ "epoch": 4.21,
1107
+ "grad_norm": 0.009625586681067944,
1108
+ "learning_rate": 9.5797507788162e-05,
1109
+ "loss": 0.0882,
1110
+ "step": 1350
1111
+ },
1112
+ {
1113
+ "epoch": 4.24,
1114
+ "grad_norm": 8.644308090209961,
1115
+ "learning_rate": 9.576635514018693e-05,
1116
+ "loss": 0.0729,
1117
+ "step": 1360
1118
+ },
1119
+ {
1120
+ "epoch": 4.27,
1121
+ "grad_norm": 11.613913536071777,
1122
+ "learning_rate": 9.573520249221185e-05,
1123
+ "loss": 0.1285,
1124
+ "step": 1370
1125
+ },
1126
+ {
1127
+ "epoch": 4.3,
1128
+ "grad_norm": 0.9490543603897095,
1129
+ "learning_rate": 9.570404984423676e-05,
1130
+ "loss": 0.0408,
1131
+ "step": 1380
1132
+ },
1133
+ {
1134
+ "epoch": 4.33,
1135
+ "grad_norm": 2.557040214538574,
1136
+ "learning_rate": 9.567289719626169e-05,
1137
+ "loss": 0.0689,
1138
+ "step": 1390
1139
+ },
1140
+ {
1141
+ "epoch": 4.36,
1142
+ "grad_norm": 7.547731399536133,
1143
+ "learning_rate": 9.56417445482866e-05,
1144
+ "loss": 0.1122,
1145
+ "step": 1400
1146
+ },
1147
+ {
1148
+ "epoch": 4.36,
1149
+ "eval_accuracy": 0.8398058252427184,
1150
+ "eval_f1": 0.833035822565054,
1151
+ "eval_loss": 0.7330206036567688,
1152
+ "eval_precision": 0.836759139491613,
1153
+ "eval_recall": 0.8398058252427184,
1154
+ "eval_runtime": 39.5302,
1155
+ "eval_samples_per_second": 72.957,
1156
+ "eval_steps_per_second": 9.132,
1157
+ "step": 1400
1158
+ },
1159
+ {
1160
+ "epoch": 4.39,
1161
+ "grad_norm": 0.47195371985435486,
1162
+ "learning_rate": 9.561059190031153e-05,
1163
+ "loss": 0.0087,
1164
+ "step": 1410
1165
+ },
1166
+ {
1167
+ "epoch": 4.42,
1168
+ "grad_norm": 0.02496817521750927,
1169
+ "learning_rate": 9.557943925233645e-05,
1170
+ "loss": 0.0678,
1171
+ "step": 1420
1172
+ },
1173
+ {
1174
+ "epoch": 4.45,
1175
+ "grad_norm": 0.044717635959386826,
1176
+ "learning_rate": 9.554828660436137e-05,
1177
+ "loss": 0.0409,
1178
+ "step": 1430
1179
+ },
1180
+ {
1181
+ "epoch": 4.49,
1182
+ "grad_norm": 2.304049015045166,
1183
+ "learning_rate": 9.55171339563863e-05,
1184
+ "loss": 0.0661,
1185
+ "step": 1440
1186
+ },
1187
+ {
1188
+ "epoch": 4.52,
1189
+ "grad_norm": 11.10191822052002,
1190
+ "learning_rate": 9.548598130841122e-05,
1191
+ "loss": 0.3254,
1192
+ "step": 1450
1193
+ },
1194
+ {
1195
+ "epoch": 4.55,
1196
+ "grad_norm": 0.0031379794236272573,
1197
+ "learning_rate": 9.545482866043615e-05,
1198
+ "loss": 0.0655,
1199
+ "step": 1460
1200
+ },
1201
+ {
1202
+ "epoch": 4.58,
1203
+ "grad_norm": 1.050758719444275,
1204
+ "learning_rate": 9.542367601246105e-05,
1205
+ "loss": 0.0968,
1206
+ "step": 1470
1207
+ },
1208
+ {
1209
+ "epoch": 4.61,
1210
+ "grad_norm": 0.027871431782841682,
1211
+ "learning_rate": 9.539252336448598e-05,
1212
+ "loss": 0.1033,
1213
+ "step": 1480
1214
+ },
1215
+ {
1216
+ "epoch": 4.64,
1217
+ "grad_norm": 0.054837290197610855,
1218
+ "learning_rate": 9.536137071651091e-05,
1219
+ "loss": 0.0225,
1220
+ "step": 1490
1221
+ },
1222
+ {
1223
+ "epoch": 4.67,
1224
+ "grad_norm": 5.67630672454834,
1225
+ "learning_rate": 9.533021806853583e-05,
1226
+ "loss": 0.0302,
1227
+ "step": 1500
1228
+ },
1229
+ {
1230
+ "epoch": 4.67,
1231
+ "eval_accuracy": 0.84500693481276,
1232
+ "eval_f1": 0.8442384441117506,
1233
+ "eval_loss": 0.7136919498443604,
1234
+ "eval_precision": 0.8469740143199302,
1235
+ "eval_recall": 0.84500693481276,
1236
+ "eval_runtime": 39.1304,
1237
+ "eval_samples_per_second": 73.702,
1238
+ "eval_steps_per_second": 9.226,
1239
+ "step": 1500
1240
+ },
1241
+ {
1242
+ "epoch": 4.7,
1243
+ "grad_norm": 0.295539915561676,
1244
+ "learning_rate": 9.529906542056076e-05,
1245
+ "loss": 0.1178,
1246
+ "step": 1510
1247
+ },
1248
+ {
1249
+ "epoch": 4.74,
1250
+ "grad_norm": 0.0796700268983841,
1251
+ "learning_rate": 9.526791277258567e-05,
1252
+ "loss": 0.0481,
1253
+ "step": 1520
1254
+ },
1255
+ {
1256
+ "epoch": 4.77,
1257
+ "grad_norm": 0.1068115308880806,
1258
+ "learning_rate": 9.523676012461059e-05,
1259
+ "loss": 0.0282,
1260
+ "step": 1530
1261
+ },
1262
+ {
1263
+ "epoch": 4.8,
1264
+ "grad_norm": 1.0221561193466187,
1265
+ "learning_rate": 9.520560747663552e-05,
1266
+ "loss": 0.0538,
1267
+ "step": 1540
1268
+ },
1269
+ {
1270
+ "epoch": 4.83,
1271
+ "grad_norm": 7.369207859039307,
1272
+ "learning_rate": 9.517445482866045e-05,
1273
+ "loss": 0.1239,
1274
+ "step": 1550
1275
+ },
1276
+ {
1277
+ "epoch": 4.86,
1278
+ "grad_norm": 9.008218765258789,
1279
+ "learning_rate": 9.514330218068536e-05,
1280
+ "loss": 0.055,
1281
+ "step": 1560
1282
+ },
1283
+ {
1284
+ "epoch": 4.89,
1285
+ "grad_norm": 3.585855722427368,
1286
+ "learning_rate": 9.511214953271028e-05,
1287
+ "loss": 0.03,
1288
+ "step": 1570
1289
+ },
1290
+ {
1291
+ "epoch": 4.92,
1292
+ "grad_norm": 0.15154320001602173,
1293
+ "learning_rate": 9.50809968847352e-05,
1294
+ "loss": 0.0168,
1295
+ "step": 1580
1296
+ },
1297
+ {
1298
+ "epoch": 4.95,
1299
+ "grad_norm": 0.030903339385986328,
1300
+ "learning_rate": 9.504984423676014e-05,
1301
+ "loss": 0.1067,
1302
+ "step": 1590
1303
+ },
1304
+ {
1305
+ "epoch": 4.98,
1306
+ "grad_norm": 0.4652014672756195,
1307
+ "learning_rate": 9.501869158878505e-05,
1308
+ "loss": 0.0462,
1309
+ "step": 1600
1310
+ },
1311
+ {
1312
+ "epoch": 4.98,
1313
+ "eval_accuracy": 0.8515950069348127,
1314
+ "eval_f1": 0.8455611718307666,
1315
+ "eval_loss": 0.8198381066322327,
1316
+ "eval_precision": 0.8519412050125947,
1317
+ "eval_recall": 0.8515950069348127,
1318
+ "eval_runtime": 39.6856,
1319
+ "eval_samples_per_second": 72.671,
1320
+ "eval_steps_per_second": 9.096,
1321
+ "step": 1600
1322
+ },
1323
+ {
1324
+ "epoch": 5.02,
1325
+ "grad_norm": 14.000198364257812,
1326
+ "learning_rate": 9.498753894080997e-05,
1327
+ "loss": 0.0785,
1328
+ "step": 1610
1329
+ },
1330
+ {
1331
+ "epoch": 5.05,
1332
+ "grad_norm": 0.21171867847442627,
1333
+ "learning_rate": 9.49563862928349e-05,
1334
+ "loss": 0.019,
1335
+ "step": 1620
1336
+ },
1337
+ {
1338
+ "epoch": 5.08,
1339
+ "grad_norm": 0.004491983912885189,
1340
+ "learning_rate": 9.492523364485981e-05,
1341
+ "loss": 0.0111,
1342
+ "step": 1630
1343
+ },
1344
+ {
1345
+ "epoch": 5.11,
1346
+ "grad_norm": 0.016514340415596962,
1347
+ "learning_rate": 9.489408099688474e-05,
1348
+ "loss": 0.0897,
1349
+ "step": 1640
1350
+ },
1351
+ {
1352
+ "epoch": 5.14,
1353
+ "grad_norm": 8.40817928314209,
1354
+ "learning_rate": 9.486292834890967e-05,
1355
+ "loss": 0.0798,
1356
+ "step": 1650
1357
+ },
1358
+ {
1359
+ "epoch": 5.17,
1360
+ "grad_norm": 0.07949113100767136,
1361
+ "learning_rate": 9.483177570093458e-05,
1362
+ "loss": 0.0216,
1363
+ "step": 1660
1364
+ },
1365
+ {
1366
+ "epoch": 5.2,
1367
+ "grad_norm": 4.111806869506836,
1368
+ "learning_rate": 9.48006230529595e-05,
1369
+ "loss": 0.03,
1370
+ "step": 1670
1371
+ },
1372
+ {
1373
+ "epoch": 5.23,
1374
+ "grad_norm": 0.036615125834941864,
1375
+ "learning_rate": 9.476947040498443e-05,
1376
+ "loss": 0.0115,
1377
+ "step": 1680
1378
+ },
1379
+ {
1380
+ "epoch": 5.26,
1381
+ "grad_norm": 0.11661379039287567,
1382
+ "learning_rate": 9.473831775700936e-05,
1383
+ "loss": 0.01,
1384
+ "step": 1690
1385
+ },
1386
+ {
1387
+ "epoch": 5.3,
1388
+ "grad_norm": 0.16430974006652832,
1389
+ "learning_rate": 9.470716510903427e-05,
1390
+ "loss": 0.0109,
1391
+ "step": 1700
1392
+ },
1393
+ {
1394
+ "epoch": 5.3,
1395
+ "eval_accuracy": 0.8477808599167822,
1396
+ "eval_f1": 0.8378487316868508,
1397
+ "eval_loss": 0.8481851816177368,
1398
+ "eval_precision": 0.8383786998144745,
1399
+ "eval_recall": 0.8477808599167822,
1400
+ "eval_runtime": 39.3442,
1401
+ "eval_samples_per_second": 73.302,
1402
+ "eval_steps_per_second": 9.175,
1403
+ "step": 1700
1404
+ },
1405
+ {
1406
+ "epoch": 5.33,
1407
+ "grad_norm": 0.005747305229306221,
1408
+ "learning_rate": 9.467601246105919e-05,
1409
+ "loss": 0.0333,
1410
+ "step": 1710
1411
+ },
1412
+ {
1413
+ "epoch": 5.36,
1414
+ "grad_norm": 5.644034385681152,
1415
+ "learning_rate": 9.464485981308412e-05,
1416
+ "loss": 0.0148,
1417
+ "step": 1720
1418
+ },
1419
+ {
1420
+ "epoch": 5.39,
1421
+ "grad_norm": 0.002410849556326866,
1422
+ "learning_rate": 9.461370716510903e-05,
1423
+ "loss": 0.0518,
1424
+ "step": 1730
1425
+ },
1426
+ {
1427
+ "epoch": 5.42,
1428
+ "grad_norm": 0.002693226793780923,
1429
+ "learning_rate": 9.458255451713396e-05,
1430
+ "loss": 0.0508,
1431
+ "step": 1740
1432
+ },
1433
+ {
1434
+ "epoch": 5.45,
1435
+ "grad_norm": 10.301042556762695,
1436
+ "learning_rate": 9.455140186915888e-05,
1437
+ "loss": 0.071,
1438
+ "step": 1750
1439
+ },
1440
+ {
1441
+ "epoch": 5.48,
1442
+ "grad_norm": 0.06220326945185661,
1443
+ "learning_rate": 9.452024922118381e-05,
1444
+ "loss": 0.0568,
1445
+ "step": 1760
1446
+ },
1447
+ {
1448
+ "epoch": 5.51,
1449
+ "grad_norm": 0.07617553323507309,
1450
+ "learning_rate": 9.448909657320872e-05,
1451
+ "loss": 0.078,
1452
+ "step": 1770
1453
+ },
1454
+ {
1455
+ "epoch": 5.55,
1456
+ "grad_norm": 0.08313547819852829,
1457
+ "learning_rate": 9.445794392523365e-05,
1458
+ "loss": 0.0483,
1459
+ "step": 1780
1460
+ },
1461
+ {
1462
+ "epoch": 5.58,
1463
+ "grad_norm": 5.813291072845459,
1464
+ "learning_rate": 9.442679127725858e-05,
1465
+ "loss": 0.0369,
1466
+ "step": 1790
1467
+ },
1468
+ {
1469
+ "epoch": 5.61,
1470
+ "grad_norm": 6.673477649688721,
1471
+ "learning_rate": 9.43956386292835e-05,
1472
+ "loss": 0.0545,
1473
+ "step": 1800
1474
+ },
1475
+ {
1476
+ "epoch": 5.61,
1477
+ "eval_accuracy": 0.8498613037447988,
1478
+ "eval_f1": 0.8506454763787459,
1479
+ "eval_loss": 0.8046442270278931,
1480
+ "eval_precision": 0.8546625043916174,
1481
+ "eval_recall": 0.8498613037447988,
1482
+ "eval_runtime": 39.6817,
1483
+ "eval_samples_per_second": 72.678,
1484
+ "eval_steps_per_second": 9.097,
1485
+ "step": 1800
1486
+ },
1487
+ {
1488
+ "epoch": 5.61,
1489
+ "step": 1800,
1490
+ "total_flos": 2.2287694956200755e+18,
1491
+ "train_loss": 0.26933162180913817,
1492
+ "train_runtime": 1430.4649,
1493
+ "train_samples_per_second": 358.485,
1494
+ "train_steps_per_second": 22.44
1495
+ }
1496
+ ],
1497
+ "logging_steps": 10,
1498
+ "max_steps": 32100,
1499
+ "num_input_tokens_seen": 0,
1500
+ "num_train_epochs": 100,
1501
+ "save_steps": 100,
1502
+ "total_flos": 2.2287694956200755e+18,
1503
+ "train_batch_size": 16,
1504
+ "trial_name": null,
1505
+ "trial_params": null
1506
+ }