vuongnhathien commited on
Commit
c3e0188
1 Parent(s): a488e2d

End of training

Browse files
README.md CHANGED
@@ -22,7 +22,7 @@ model-index:
22
  metrics:
23
  - name: Accuracy
24
  type: accuracy
25
- value: 0.94831013916501
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -32,8 +32,8 @@ should probably proofread and complete it, then remove this comment. -->
32
 
33
  This model is a fine-tuned version of [facebook/convnextv2-base-22k-384](https://huggingface.co/facebook/convnextv2-base-22k-384) on the imagefolder dataset.
34
  It achieves the following results on the evaluation set:
35
- - Loss: 0.3120
36
- - Accuracy: 0.9483
37
 
38
  ## Model description
39
 
 
22
  metrics:
23
  - name: Accuracy
24
  type: accuracy
25
+ value: 0.9378968253968254
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
32
 
33
  This model is a fine-tuned version of [facebook/convnextv2-base-22k-384](https://huggingface.co/facebook/convnextv2-base-22k-384) on the imagefolder dataset.
34
  It achieves the following results on the evaluation set:
35
+ - Loss: 0.2863
36
+ - Accuracy: 0.9379
37
 
38
  ## Model description
39
 
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 4.09349935387607e+19,
4
+ "train_loss": 0.2596938943081926,
5
+ "train_runtime": 18130.6882,
6
+ "train_samples_per_second": 9.697,
7
+ "train_steps_per_second": 1.212
8
+ }
runs/May26_18-10-54_f45edd2af2fb/events.out.tfevents.1716765512.f45edd2af2fb.26.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a97cec914e57d6d5bd2bd535e26523842b8d24ef5c4f9c0edb733ad4083eb64c
3
+ size 417
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 4.09349935387607e+19,
4
+ "train_loss": 0.2596938943081926,
5
+ "train_runtime": 18130.6882,
6
+ "train_samples_per_second": 9.697,
7
+ "train_steps_per_second": 1.212
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1653 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.3029824197292328,
3
+ "best_model_checkpoint": "./convnext-base-3e-5-batch-8/checkpoint-8792",
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 21980,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.05,
13
+ "grad_norm": 20.464111328125,
14
+ "learning_rate": 2.999846786074732e-05,
15
+ "loss": 2.7431,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.09,
20
+ "grad_norm": 26.436141967773438,
21
+ "learning_rate": 2.999387175598269e-05,
22
+ "loss": 1.6104,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.14,
27
+ "grad_norm": 54.28024673461914,
28
+ "learning_rate": 2.998621262462245e-05,
29
+ "loss": 1.2012,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.18,
34
+ "grad_norm": 53.153194427490234,
35
+ "learning_rate": 2.9975492031314045e-05,
36
+ "loss": 0.974,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.23,
41
+ "grad_norm": 38.65731430053711,
42
+ "learning_rate": 2.996171216611638e-05,
43
+ "loss": 0.8435,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.27,
48
+ "grad_norm": 23.02427101135254,
49
+ "learning_rate": 2.994487584405244e-05,
50
+ "loss": 0.7449,
51
+ "step": 600
52
+ },
53
+ {
54
+ "epoch": 0.32,
55
+ "grad_norm": 19.776226043701172,
56
+ "learning_rate": 2.992498650453421e-05,
57
+ "loss": 0.6626,
58
+ "step": 700
59
+ },
60
+ {
61
+ "epoch": 0.36,
62
+ "grad_norm": 19.967565536499023,
63
+ "learning_rate": 2.990204821066006e-05,
64
+ "loss": 0.5485,
65
+ "step": 800
66
+ },
67
+ {
68
+ "epoch": 0.41,
69
+ "grad_norm": 32.89971923828125,
70
+ "learning_rate": 2.9876065648384715e-05,
71
+ "loss": 0.6233,
72
+ "step": 900
73
+ },
74
+ {
75
+ "epoch": 0.45,
76
+ "grad_norm": 23.158945083618164,
77
+ "learning_rate": 2.984704412556199e-05,
78
+ "loss": 0.595,
79
+ "step": 1000
80
+ },
81
+ {
82
+ "epoch": 0.5,
83
+ "grad_norm": 33.01531219482422,
84
+ "learning_rate": 2.981498957086044e-05,
85
+ "loss": 0.5844,
86
+ "step": 1100
87
+ },
88
+ {
89
+ "epoch": 0.55,
90
+ "grad_norm": 36.02910614013672,
91
+ "learning_rate": 2.977990853255228e-05,
92
+ "loss": 0.4787,
93
+ "step": 1200
94
+ },
95
+ {
96
+ "epoch": 0.59,
97
+ "grad_norm": 25.670297622680664,
98
+ "learning_rate": 2.974180817717561e-05,
99
+ "loss": 0.6183,
100
+ "step": 1300
101
+ },
102
+ {
103
+ "epoch": 0.64,
104
+ "grad_norm": 16.535247802734375,
105
+ "learning_rate": 2.970069628807043e-05,
106
+ "loss": 0.509,
107
+ "step": 1400
108
+ },
109
+ {
110
+ "epoch": 0.68,
111
+ "grad_norm": 20.26650047302246,
112
+ "learning_rate": 2.965658126378862e-05,
113
+ "loss": 0.5619,
114
+ "step": 1500
115
+ },
116
+ {
117
+ "epoch": 0.73,
118
+ "grad_norm": 51.08666229248047,
119
+ "learning_rate": 2.9609472116378222e-05,
120
+ "loss": 0.4814,
121
+ "step": 1600
122
+ },
123
+ {
124
+ "epoch": 0.77,
125
+ "grad_norm": 13.978100776672363,
126
+ "learning_rate": 2.955937846954242e-05,
127
+ "loss": 0.4695,
128
+ "step": 1700
129
+ },
130
+ {
131
+ "epoch": 0.82,
132
+ "grad_norm": 2.855776071548462,
133
+ "learning_rate": 2.9506310556673573e-05,
134
+ "loss": 0.5061,
135
+ "step": 1800
136
+ },
137
+ {
138
+ "epoch": 0.86,
139
+ "grad_norm": 5.468057155609131,
140
+ "learning_rate": 2.945027921876265e-05,
141
+ "loss": 0.5227,
142
+ "step": 1900
143
+ },
144
+ {
145
+ "epoch": 0.91,
146
+ "grad_norm": 2.113645315170288,
147
+ "learning_rate": 2.9391295902184625e-05,
148
+ "loss": 0.4312,
149
+ "step": 2000
150
+ },
151
+ {
152
+ "epoch": 0.96,
153
+ "grad_norm": 37.98606872558594,
154
+ "learning_rate": 2.93293726563601e-05,
155
+ "loss": 0.5851,
156
+ "step": 2100
157
+ },
158
+ {
159
+ "epoch": 1.0,
160
+ "eval_accuracy": 0.8918489065606362,
161
+ "eval_loss": 0.38079071044921875,
162
+ "eval_runtime": 114.5109,
163
+ "eval_samples_per_second": 21.963,
164
+ "eval_steps_per_second": 2.751,
165
+ "step": 2198
166
+ },
167
+ {
168
+ "epoch": 1.0,
169
+ "grad_norm": 20.0971622467041,
170
+ "learning_rate": 2.9264522131293818e-05,
171
+ "loss": 0.5127,
172
+ "step": 2200
173
+ },
174
+ {
175
+ "epoch": 1.05,
176
+ "grad_norm": 26.824159622192383,
177
+ "learning_rate": 2.919675757499045e-05,
178
+ "loss": 0.3676,
179
+ "step": 2300
180
+ },
181
+ {
182
+ "epoch": 1.09,
183
+ "grad_norm": 14.176966667175293,
184
+ "learning_rate": 2.9126092830748217e-05,
185
+ "loss": 0.3944,
186
+ "step": 2400
187
+ },
188
+ {
189
+ "epoch": 1.14,
190
+ "grad_norm": 31.566120147705078,
191
+ "learning_rate": 2.9052542334330916e-05,
192
+ "loss": 0.3678,
193
+ "step": 2500
194
+ },
195
+ {
196
+ "epoch": 1.18,
197
+ "grad_norm": 30.576416015625,
198
+ "learning_rate": 2.897612111101888e-05,
199
+ "loss": 0.465,
200
+ "step": 2600
201
+ },
202
+ {
203
+ "epoch": 1.23,
204
+ "grad_norm": 52.94912338256836,
205
+ "learning_rate": 2.889684477253959e-05,
206
+ "loss": 0.3741,
207
+ "step": 2700
208
+ },
209
+ {
210
+ "epoch": 1.27,
211
+ "grad_norm": 26.88067054748535,
212
+ "learning_rate": 2.8814729513878365e-05,
213
+ "loss": 0.3993,
214
+ "step": 2800
215
+ },
216
+ {
217
+ "epoch": 1.32,
218
+ "grad_norm": 26.82840919494629,
219
+ "learning_rate": 2.8729792109970015e-05,
220
+ "loss": 0.3801,
221
+ "step": 2900
222
+ },
223
+ {
224
+ "epoch": 1.36,
225
+ "grad_norm": 52.661399841308594,
226
+ "learning_rate": 2.864204991227195e-05,
227
+ "loss": 0.326,
228
+ "step": 3000
229
+ },
230
+ {
231
+ "epoch": 1.41,
232
+ "grad_norm": 11.977348327636719,
233
+ "learning_rate": 2.855152084521953e-05,
234
+ "loss": 0.3646,
235
+ "step": 3100
236
+ },
237
+ {
238
+ "epoch": 1.46,
239
+ "grad_norm": 28.014543533325195,
240
+ "learning_rate": 2.8458223402564366e-05,
241
+ "loss": 0.3409,
242
+ "step": 3200
243
+ },
244
+ {
245
+ "epoch": 1.5,
246
+ "grad_norm": 1.3806567192077637,
247
+ "learning_rate": 2.836217664359634e-05,
248
+ "loss": 0.467,
249
+ "step": 3300
250
+ },
251
+ {
252
+ "epoch": 1.55,
253
+ "grad_norm": 33.71268844604492,
254
+ "learning_rate": 2.826340018925006e-05,
255
+ "loss": 0.4337,
256
+ "step": 3400
257
+ },
258
+ {
259
+ "epoch": 1.59,
260
+ "grad_norm": 34.983802795410156,
261
+ "learning_rate": 2.8161914218096568e-05,
262
+ "loss": 0.3762,
263
+ "step": 3500
264
+ },
265
+ {
266
+ "epoch": 1.64,
267
+ "grad_norm": 23.66938591003418,
268
+ "learning_rate": 2.8057739462221215e-05,
269
+ "loss": 0.4465,
270
+ "step": 3600
271
+ },
272
+ {
273
+ "epoch": 1.68,
274
+ "grad_norm": 16.07572364807129,
275
+ "learning_rate": 2.7950897202988338e-05,
276
+ "loss": 0.3359,
277
+ "step": 3700
278
+ },
279
+ {
280
+ "epoch": 1.73,
281
+ "grad_norm": 25.094499588012695,
282
+ "learning_rate": 2.7841409266693838e-05,
283
+ "loss": 0.4376,
284
+ "step": 3800
285
+ },
286
+ {
287
+ "epoch": 1.77,
288
+ "grad_norm": 44.122901916503906,
289
+ "learning_rate": 2.7729298020106363e-05,
290
+ "loss": 0.3529,
291
+ "step": 3900
292
+ },
293
+ {
294
+ "epoch": 1.82,
295
+ "grad_norm": 29.057430267333984,
296
+ "learning_rate": 2.761458636589813e-05,
297
+ "loss": 0.3423,
298
+ "step": 4000
299
+ },
300
+ {
301
+ "epoch": 1.87,
302
+ "grad_norm": 39.68132019042969,
303
+ "learning_rate": 2.7497297737966217e-05,
304
+ "loss": 0.3745,
305
+ "step": 4100
306
+ },
307
+ {
308
+ "epoch": 1.91,
309
+ "grad_norm": 25.461807250976562,
310
+ "learning_rate": 2.7377456096645395e-05,
311
+ "loss": 0.4214,
312
+ "step": 4200
313
+ },
314
+ {
315
+ "epoch": 1.96,
316
+ "grad_norm": 34.991153717041016,
317
+ "learning_rate": 2.725508592381337e-05,
318
+ "loss": 0.3975,
319
+ "step": 4300
320
+ },
321
+ {
322
+ "epoch": 2.0,
323
+ "eval_accuracy": 0.9093439363817097,
324
+ "eval_loss": 0.3232283592224121,
325
+ "eval_runtime": 114.6327,
326
+ "eval_samples_per_second": 21.94,
327
+ "eval_steps_per_second": 2.748,
328
+ "step": 4396
329
+ },
330
+ {
331
+ "epoch": 2.0,
332
+ "grad_norm": 23.15580177307129,
333
+ "learning_rate": 2.7130212217889484e-05,
334
+ "loss": 0.3681,
335
+ "step": 4400
336
+ },
337
+ {
338
+ "epoch": 2.05,
339
+ "grad_norm": 28.531185150146484,
340
+ "learning_rate": 2.7002860488727944e-05,
341
+ "loss": 0.3262,
342
+ "step": 4500
343
+ },
344
+ {
345
+ "epoch": 2.09,
346
+ "grad_norm": 38.52848434448242,
347
+ "learning_rate": 2.6873056752406504e-05,
348
+ "loss": 0.3097,
349
+ "step": 4600
350
+ },
351
+ {
352
+ "epoch": 2.14,
353
+ "grad_norm": 1.2146570682525635,
354
+ "learning_rate": 2.6740827525911766e-05,
355
+ "loss": 0.4274,
356
+ "step": 4700
357
+ },
358
+ {
359
+ "epoch": 2.18,
360
+ "grad_norm": 19.732980728149414,
361
+ "learning_rate": 2.6606199821722166e-05,
362
+ "loss": 0.268,
363
+ "step": 4800
364
+ },
365
+ {
366
+ "epoch": 2.23,
367
+ "grad_norm": 15.875640869140625,
368
+ "learning_rate": 2.646920114228972e-05,
369
+ "loss": 0.2734,
370
+ "step": 4900
371
+ },
372
+ {
373
+ "epoch": 2.27,
374
+ "grad_norm": 18.1243953704834,
375
+ "learning_rate": 2.632985947442167e-05,
376
+ "loss": 0.2739,
377
+ "step": 5000
378
+ },
379
+ {
380
+ "epoch": 2.32,
381
+ "grad_norm": 17.237104415893555,
382
+ "learning_rate": 2.6188203283563198e-05,
383
+ "loss": 0.2917,
384
+ "step": 5100
385
+ },
386
+ {
387
+ "epoch": 2.37,
388
+ "grad_norm": 18.371570587158203,
389
+ "learning_rate": 2.6044261507982356e-05,
390
+ "loss": 0.3551,
391
+ "step": 5200
392
+ },
393
+ {
394
+ "epoch": 2.41,
395
+ "grad_norm": 14.930416107177734,
396
+ "learning_rate": 2.589806355285841e-05,
397
+ "loss": 0.3264,
398
+ "step": 5300
399
+ },
400
+ {
401
+ "epoch": 2.46,
402
+ "grad_norm": 30.236698150634766,
403
+ "learning_rate": 2.5749639284274782e-05,
404
+ "loss": 0.3388,
405
+ "step": 5400
406
+ },
407
+ {
408
+ "epoch": 2.5,
409
+ "grad_norm": 0.13715609908103943,
410
+ "learning_rate": 2.5599019023117872e-05,
411
+ "loss": 0.2575,
412
+ "step": 5500
413
+ },
414
+ {
415
+ "epoch": 2.55,
416
+ "grad_norm": 6.84519624710083,
417
+ "learning_rate": 2.5446233538882924e-05,
418
+ "loss": 0.3263,
419
+ "step": 5600
420
+ },
421
+ {
422
+ "epoch": 2.59,
423
+ "grad_norm": 58.51634979248047,
424
+ "learning_rate": 2.5291314043388295e-05,
425
+ "loss": 0.3716,
426
+ "step": 5700
427
+ },
428
+ {
429
+ "epoch": 2.64,
430
+ "grad_norm": 0.961408257484436,
431
+ "learning_rate": 2.513429218439932e-05,
432
+ "loss": 0.3012,
433
+ "step": 5800
434
+ },
435
+ {
436
+ "epoch": 2.68,
437
+ "grad_norm": 13.59073257446289,
438
+ "learning_rate": 2.497520003916316e-05,
439
+ "loss": 0.2911,
440
+ "step": 5900
441
+ },
442
+ {
443
+ "epoch": 2.73,
444
+ "grad_norm": 12.26279354095459,
445
+ "learning_rate": 2.4814070107855878e-05,
446
+ "loss": 0.279,
447
+ "step": 6000
448
+ },
449
+ {
450
+ "epoch": 2.78,
451
+ "grad_norm": 30.789432525634766,
452
+ "learning_rate": 2.465093530694315e-05,
453
+ "loss": 0.29,
454
+ "step": 6100
455
+ },
456
+ {
457
+ "epoch": 2.82,
458
+ "grad_norm": 0.28742390871047974,
459
+ "learning_rate": 2.448582896245591e-05,
460
+ "loss": 0.3051,
461
+ "step": 6200
462
+ },
463
+ {
464
+ "epoch": 2.87,
465
+ "grad_norm": 2.97086501121521,
466
+ "learning_rate": 2.4318784803182317e-05,
467
+ "loss": 0.2735,
468
+ "step": 6300
469
+ },
470
+ {
471
+ "epoch": 2.91,
472
+ "grad_norm": 32.147727966308594,
473
+ "learning_rate": 2.4149836953777488e-05,
474
+ "loss": 0.2992,
475
+ "step": 6400
476
+ },
477
+ {
478
+ "epoch": 2.96,
479
+ "grad_norm": 28.64170265197754,
480
+ "learning_rate": 2.3979019927792315e-05,
481
+ "loss": 0.3337,
482
+ "step": 6500
483
+ },
484
+ {
485
+ "epoch": 3.0,
486
+ "eval_accuracy": 0.925248508946322,
487
+ "eval_loss": 0.3209765553474426,
488
+ "eval_runtime": 113.6693,
489
+ "eval_samples_per_second": 22.126,
490
+ "eval_steps_per_second": 2.771,
491
+ "step": 6594
492
+ },
493
+ {
494
+ "epoch": 3.0,
495
+ "grad_norm": 14.05492115020752,
496
+ "learning_rate": 2.3806368620622876e-05,
497
+ "loss": 0.3353,
498
+ "step": 6600
499
+ },
500
+ {
501
+ "epoch": 3.05,
502
+ "grad_norm": 5.791962623596191,
503
+ "learning_rate": 2.3631918302381803e-05,
504
+ "loss": 0.2148,
505
+ "step": 6700
506
+ },
507
+ {
508
+ "epoch": 3.09,
509
+ "grad_norm": 0.8230623602867126,
510
+ "learning_rate": 2.345570461069312e-05,
511
+ "loss": 0.267,
512
+ "step": 6800
513
+ },
514
+ {
515
+ "epoch": 3.14,
516
+ "grad_norm": 0.2485540211200714,
517
+ "learning_rate": 2.327776354341202e-05,
518
+ "loss": 0.2472,
519
+ "step": 6900
520
+ },
521
+ {
522
+ "epoch": 3.18,
523
+ "grad_norm": 0.06697408854961395,
524
+ "learning_rate": 2.3098131451271016e-05,
525
+ "loss": 0.2568,
526
+ "step": 7000
527
+ },
528
+ {
529
+ "epoch": 3.23,
530
+ "grad_norm": 5.22568941116333,
531
+ "learning_rate": 2.291684503045402e-05,
532
+ "loss": 0.2392,
533
+ "step": 7100
534
+ },
535
+ {
536
+ "epoch": 3.28,
537
+ "grad_norm": 58.771671295166016,
538
+ "learning_rate": 2.2733941315099883e-05,
539
+ "loss": 0.2867,
540
+ "step": 7200
541
+ },
542
+ {
543
+ "epoch": 3.32,
544
+ "grad_norm": 21.552099227905273,
545
+ "learning_rate": 2.2549457669736836e-05,
546
+ "loss": 0.2652,
547
+ "step": 7300
548
+ },
549
+ {
550
+ "epoch": 3.37,
551
+ "grad_norm": 1.265632152557373,
552
+ "learning_rate": 2.2363431781649483e-05,
553
+ "loss": 0.2284,
554
+ "step": 7400
555
+ },
556
+ {
557
+ "epoch": 3.41,
558
+ "grad_norm": 5.533514499664307,
559
+ "learning_rate": 2.2175901653179847e-05,
560
+ "loss": 0.232,
561
+ "step": 7500
562
+ },
563
+ {
564
+ "epoch": 3.46,
565
+ "grad_norm": 0.05318501219153404,
566
+ "learning_rate": 2.1986905593964048e-05,
567
+ "loss": 0.2094,
568
+ "step": 7600
569
+ },
570
+ {
571
+ "epoch": 3.5,
572
+ "grad_norm": 20.877737045288086,
573
+ "learning_rate": 2.1796482213106203e-05,
574
+ "loss": 0.3036,
575
+ "step": 7700
576
+ },
577
+ {
578
+ "epoch": 3.55,
579
+ "grad_norm": 43.25149917602539,
580
+ "learning_rate": 2.1604670411291174e-05,
581
+ "loss": 0.2388,
582
+ "step": 7800
583
+ },
584
+ {
585
+ "epoch": 3.59,
586
+ "grad_norm": 15.92627239227295,
587
+ "learning_rate": 2.1411509372837724e-05,
588
+ "loss": 0.3357,
589
+ "step": 7900
590
+ },
591
+ {
592
+ "epoch": 3.64,
593
+ "grad_norm": 12.71852970123291,
594
+ "learning_rate": 2.121703855769373e-05,
595
+ "loss": 0.2069,
596
+ "step": 8000
597
+ },
598
+ {
599
+ "epoch": 3.69,
600
+ "grad_norm": 12.913580894470215,
601
+ "learning_rate": 2.102129769337511e-05,
602
+ "loss": 0.2867,
603
+ "step": 8100
604
+ },
605
+ {
606
+ "epoch": 3.73,
607
+ "grad_norm": 0.06732411682605743,
608
+ "learning_rate": 2.0824326766850072e-05,
609
+ "loss": 0.28,
610
+ "step": 8200
611
+ },
612
+ {
613
+ "epoch": 3.78,
614
+ "grad_norm": 12.845504760742188,
615
+ "learning_rate": 2.0626166016370375e-05,
616
+ "loss": 0.2245,
617
+ "step": 8300
618
+ },
619
+ {
620
+ "epoch": 3.82,
621
+ "grad_norm": 0.00928166788071394,
622
+ "learning_rate": 2.042685592325123e-05,
623
+ "loss": 0.2359,
624
+ "step": 8400
625
+ },
626
+ {
627
+ "epoch": 3.87,
628
+ "grad_norm": 9.65197467803955,
629
+ "learning_rate": 2.0226437203601602e-05,
630
+ "loss": 0.2984,
631
+ "step": 8500
632
+ },
633
+ {
634
+ "epoch": 3.91,
635
+ "grad_norm": 25.082759857177734,
636
+ "learning_rate": 2.0024950800006463e-05,
637
+ "loss": 0.2164,
638
+ "step": 8600
639
+ },
640
+ {
641
+ "epoch": 3.96,
642
+ "grad_norm": 9.887031555175781,
643
+ "learning_rate": 1.9822437873162863e-05,
644
+ "loss": 0.2279,
645
+ "step": 8700
646
+ },
647
+ {
648
+ "epoch": 4.0,
649
+ "eval_accuracy": 0.9308151093439364,
650
+ "eval_loss": 0.3029824197292328,
651
+ "eval_runtime": 112.1873,
652
+ "eval_samples_per_second": 22.418,
653
+ "eval_steps_per_second": 2.808,
654
+ "step": 8792
655
+ },
656
+ {
657
+ "epoch": 4.0,
658
+ "grad_norm": 6.438531398773193,
659
+ "learning_rate": 1.961893979347137e-05,
660
+ "loss": 0.2683,
661
+ "step": 8800
662
+ },
663
+ {
664
+ "epoch": 4.05,
665
+ "grad_norm": 0.2623905837535858,
666
+ "learning_rate": 1.9414498132584773e-05,
667
+ "loss": 0.2297,
668
+ "step": 8900
669
+ },
670
+ {
671
+ "epoch": 4.09,
672
+ "grad_norm": 13.5099515914917,
673
+ "learning_rate": 1.9209154654915524e-05,
674
+ "loss": 0.1985,
675
+ "step": 9000
676
+ },
677
+ {
678
+ "epoch": 4.14,
679
+ "grad_norm": 0.006642199121415615,
680
+ "learning_rate": 1.900295130910396e-05,
681
+ "loss": 0.2145,
682
+ "step": 9100
683
+ },
684
+ {
685
+ "epoch": 4.19,
686
+ "grad_norm": 41.91756057739258,
687
+ "learning_rate": 1.879593021944875e-05,
688
+ "loss": 0.2221,
689
+ "step": 9200
690
+ },
691
+ {
692
+ "epoch": 4.23,
693
+ "grad_norm": 21.074771881103516,
694
+ "learning_rate": 1.8588133677301595e-05,
695
+ "loss": 0.2129,
696
+ "step": 9300
697
+ },
698
+ {
699
+ "epoch": 4.28,
700
+ "grad_norm": 0.3562418520450592,
701
+ "learning_rate": 1.837960413242765e-05,
702
+ "loss": 0.2438,
703
+ "step": 9400
704
+ },
705
+ {
706
+ "epoch": 4.32,
707
+ "grad_norm": 40.30086898803711,
708
+ "learning_rate": 1.817038418433373e-05,
709
+ "loss": 0.2036,
710
+ "step": 9500
711
+ },
712
+ {
713
+ "epoch": 4.37,
714
+ "grad_norm": 0.014717689715325832,
715
+ "learning_rate": 1.796051657356582e-05,
716
+ "loss": 0.2003,
717
+ "step": 9600
718
+ },
719
+ {
720
+ "epoch": 4.41,
721
+ "grad_norm": 17.778074264526367,
722
+ "learning_rate": 1.7750044172977838e-05,
723
+ "loss": 0.2179,
724
+ "step": 9700
725
+ },
726
+ {
727
+ "epoch": 4.46,
728
+ "grad_norm": 5.801839351654053,
729
+ "learning_rate": 1.7539009978973312e-05,
730
+ "loss": 0.2142,
731
+ "step": 9800
732
+ },
733
+ {
734
+ "epoch": 4.5,
735
+ "grad_norm": 5.123954772949219,
736
+ "learning_rate": 1.7327457102721887e-05,
737
+ "loss": 0.1686,
738
+ "step": 9900
739
+ },
740
+ {
741
+ "epoch": 4.55,
742
+ "grad_norm": 4.904812812805176,
743
+ "learning_rate": 1.711542876135233e-05,
744
+ "loss": 0.2073,
745
+ "step": 10000
746
+ },
747
+ {
748
+ "epoch": 4.6,
749
+ "grad_norm": 15.447718620300293,
750
+ "learning_rate": 1.6902968269123902e-05,
751
+ "loss": 0.2276,
752
+ "step": 10100
753
+ },
754
+ {
755
+ "epoch": 4.64,
756
+ "grad_norm": 0.12395219504833221,
757
+ "learning_rate": 1.669011902857791e-05,
758
+ "loss": 0.1882,
759
+ "step": 10200
760
+ },
761
+ {
762
+ "epoch": 4.69,
763
+ "grad_norm": 17.49830436706543,
764
+ "learning_rate": 1.6476924521671194e-05,
765
+ "loss": 0.2127,
766
+ "step": 10300
767
+ },
768
+ {
769
+ "epoch": 4.73,
770
+ "grad_norm": 10.882052421569824,
771
+ "learning_rate": 1.6263428300893422e-05,
772
+ "loss": 0.2202,
773
+ "step": 10400
774
+ },
775
+ {
776
+ "epoch": 4.78,
777
+ "grad_norm": 37.238731384277344,
778
+ "learning_rate": 1.604967398036996e-05,
779
+ "loss": 0.1663,
780
+ "step": 10500
781
+ },
782
+ {
783
+ "epoch": 4.82,
784
+ "grad_norm": 9.180378913879395,
785
+ "learning_rate": 1.5835705226952112e-05,
786
+ "loss": 0.2547,
787
+ "step": 10600
788
+ },
789
+ {
790
+ "epoch": 4.87,
791
+ "grad_norm": 0.04440607875585556,
792
+ "learning_rate": 1.5621565751296676e-05,
793
+ "loss": 0.2186,
794
+ "step": 10700
795
+ },
796
+ {
797
+ "epoch": 4.91,
798
+ "grad_norm": 0.13343572616577148,
799
+ "learning_rate": 1.540729929893649e-05,
800
+ "loss": 0.2779,
801
+ "step": 10800
802
+ },
803
+ {
804
+ "epoch": 4.96,
805
+ "grad_norm": 1.3157250881195068,
806
+ "learning_rate": 1.5192949641343834e-05,
807
+ "loss": 0.1696,
808
+ "step": 10900
809
+ },
810
+ {
811
+ "epoch": 5.0,
812
+ "eval_accuracy": 0.9292246520874752,
813
+ "eval_loss": 0.34775686264038086,
814
+ "eval_runtime": 111.4119,
815
+ "eval_samples_per_second": 22.574,
816
+ "eval_steps_per_second": 2.827,
817
+ "step": 10990
818
+ },
819
+ {
820
+ "epoch": 5.0,
821
+ "grad_norm": 0.44680455327033997,
822
+ "learning_rate": 1.4978560566988603e-05,
823
+ "loss": 0.23,
824
+ "step": 11000
825
+ },
826
+ {
827
+ "epoch": 5.05,
828
+ "grad_norm": 0.041424062103033066,
829
+ "learning_rate": 1.4764175872392958e-05,
830
+ "loss": 0.1913,
831
+ "step": 11100
832
+ },
833
+ {
834
+ "epoch": 5.1,
835
+ "grad_norm": 2.6414008140563965,
836
+ "learning_rate": 1.454983935318433e-05,
837
+ "loss": 0.1374,
838
+ "step": 11200
839
+ },
840
+ {
841
+ "epoch": 5.14,
842
+ "grad_norm": 0.396395206451416,
843
+ "learning_rate": 1.433559479514864e-05,
844
+ "loss": 0.1807,
845
+ "step": 11300
846
+ },
847
+ {
848
+ "epoch": 5.19,
849
+ "grad_norm": 8.14771842956543,
850
+ "learning_rate": 1.4121485965285485e-05,
851
+ "loss": 0.1888,
852
+ "step": 11400
853
+ },
854
+ {
855
+ "epoch": 5.23,
856
+ "grad_norm": 0.3801160454750061,
857
+ "learning_rate": 1.3907556602867213e-05,
858
+ "loss": 0.1838,
859
+ "step": 11500
860
+ },
861
+ {
862
+ "epoch": 5.28,
863
+ "grad_norm": 1.0655605792999268,
864
+ "learning_rate": 1.3693850410503614e-05,
865
+ "loss": 0.1483,
866
+ "step": 11600
867
+ },
868
+ {
869
+ "epoch": 5.32,
870
+ "grad_norm": 0.37717050313949585,
871
+ "learning_rate": 1.3480411045214147e-05,
872
+ "loss": 0.1635,
873
+ "step": 11700
874
+ },
875
+ {
876
+ "epoch": 5.37,
877
+ "grad_norm": 0.017707068473100662,
878
+ "learning_rate": 1.326728210950942e-05,
879
+ "loss": 0.1871,
880
+ "step": 11800
881
+ },
882
+ {
883
+ "epoch": 5.41,
884
+ "grad_norm": 0.01935901865363121,
885
+ "learning_rate": 1.3054507142483875e-05,
886
+ "loss": 0.1479,
887
+ "step": 11900
888
+ },
889
+ {
890
+ "epoch": 5.46,
891
+ "grad_norm": 0.2556498050689697,
892
+ "learning_rate": 1.2842129610921378e-05,
893
+ "loss": 0.1754,
894
+ "step": 12000
895
+ },
896
+ {
897
+ "epoch": 5.51,
898
+ "grad_norm": 1.742725133895874,
899
+ "learning_rate": 1.2630192900415582e-05,
900
+ "loss": 0.2713,
901
+ "step": 12100
902
+ },
903
+ {
904
+ "epoch": 5.55,
905
+ "grad_norm": 40.924644470214844,
906
+ "learning_rate": 1.2418740306506923e-05,
907
+ "loss": 0.1909,
908
+ "step": 12200
909
+ },
910
+ {
911
+ "epoch": 5.6,
912
+ "grad_norm": 49.58342361450195,
913
+ "learning_rate": 1.2207815025837977e-05,
914
+ "loss": 0.159,
915
+ "step": 12300
916
+ },
917
+ {
918
+ "epoch": 5.64,
919
+ "grad_norm": 0.03460393100976944,
920
+ "learning_rate": 1.1997460147328984e-05,
921
+ "loss": 0.1559,
922
+ "step": 12400
923
+ },
924
+ {
925
+ "epoch": 5.69,
926
+ "grad_norm": 0.0873652920126915,
927
+ "learning_rate": 1.178771864337546e-05,
928
+ "loss": 0.1713,
929
+ "step": 12500
930
+ },
931
+ {
932
+ "epoch": 5.73,
933
+ "grad_norm": 0.1891886442899704,
934
+ "learning_rate": 1.1578633361069559e-05,
935
+ "loss": 0.2004,
936
+ "step": 12600
937
+ },
938
+ {
939
+ "epoch": 5.78,
940
+ "grad_norm": 0.3393521010875702,
941
+ "learning_rate": 1.1370247013447035e-05,
942
+ "loss": 0.1993,
943
+ "step": 12700
944
+ },
945
+ {
946
+ "epoch": 5.82,
947
+ "grad_norm": 3.934659719467163,
948
+ "learning_rate": 1.1162602170761611e-05,
949
+ "loss": 0.1507,
950
+ "step": 12800
951
+ },
952
+ {
953
+ "epoch": 5.87,
954
+ "grad_norm": 1.6851441860198975,
955
+ "learning_rate": 1.095574125178849e-05,
956
+ "loss": 0.1649,
957
+ "step": 12900
958
+ },
959
+ {
960
+ "epoch": 5.91,
961
+ "grad_norm": 74.29869079589844,
962
+ "learning_rate": 1.0749706515158863e-05,
963
+ "loss": 0.2056,
964
+ "step": 13000
965
+ },
966
+ {
967
+ "epoch": 5.96,
968
+ "grad_norm": 23.88290786743164,
969
+ "learning_rate": 1.0544540050727048e-05,
970
+ "loss": 0.1658,
971
+ "step": 13100
972
+ },
973
+ {
974
+ "epoch": 6.0,
975
+ "eval_accuracy": 0.9427435387673956,
976
+ "eval_loss": 0.3084171712398529,
977
+ "eval_runtime": 111.2598,
978
+ "eval_samples_per_second": 22.605,
979
+ "eval_steps_per_second": 2.831,
980
+ "step": 13188
981
+ },
982
+ {
983
+ "epoch": 6.01,
984
+ "grad_norm": 15.39907169342041,
985
+ "learning_rate": 1.0340283770972167e-05,
986
+ "loss": 0.2241,
987
+ "step": 13200
988
+ },
989
+ {
990
+ "epoch": 6.05,
991
+ "grad_norm": 0.004868438933044672,
992
+ "learning_rate": 1.0136979402436069e-05,
993
+ "loss": 0.1529,
994
+ "step": 13300
995
+ },
996
+ {
997
+ "epoch": 6.1,
998
+ "grad_norm": 45.70091247558594,
999
+ "learning_rate": 9.93466847719919e-06,
1000
+ "loss": 0.1543,
1001
+ "step": 13400
1002
+ },
1003
+ {
1004
+ "epoch": 6.14,
1005
+ "grad_norm": 0.8413438200950623,
1006
+ "learning_rate": 9.733392324396167e-06,
1007
+ "loss": 0.1236,
1008
+ "step": 13500
1009
+ },
1010
+ {
1011
+ "epoch": 6.19,
1012
+ "grad_norm": 13.94447135925293,
1013
+ "learning_rate": 9.533192061772919e-06,
1014
+ "loss": 0.1151,
1015
+ "step": 13600
1016
+ },
1017
+ {
1018
+ "epoch": 6.23,
1019
+ "grad_norm": 0.031886328011751175,
1020
+ "learning_rate": 9.334108587286877e-06,
1021
+ "loss": 0.1213,
1022
+ "step": 13700
1023
+ },
1024
+ {
1025
+ "epoch": 6.28,
1026
+ "grad_norm": 1.3224759101867676,
1027
+ "learning_rate": 9.136182570752153e-06,
1028
+ "loss": 0.1151,
1029
+ "step": 13800
1030
+ },
1031
+ {
1032
+ "epoch": 6.32,
1033
+ "grad_norm": 0.02312690019607544,
1034
+ "learning_rate": 8.93945444553128e-06,
1035
+ "loss": 0.1601,
1036
+ "step": 13900
1037
+ },
1038
+ {
1039
+ "epoch": 6.37,
1040
+ "grad_norm": 0.02013530395925045,
1041
+ "learning_rate": 8.743964400275304e-06,
1042
+ "loss": 0.133,
1043
+ "step": 14000
1044
+ },
1045
+ {
1046
+ "epoch": 6.41,
1047
+ "grad_norm": 0.02437993884086609,
1048
+ "learning_rate": 8.549752370713798e-06,
1049
+ "loss": 0.1754,
1050
+ "step": 14100
1051
+ },
1052
+ {
1053
+ "epoch": 6.46,
1054
+ "grad_norm": 0.005770612042397261,
1055
+ "learning_rate": 8.356858031496596e-06,
1056
+ "loss": 0.1421,
1057
+ "step": 14200
1058
+ },
1059
+ {
1060
+ "epoch": 6.51,
1061
+ "grad_norm": 0.008364029228687286,
1062
+ "learning_rate": 8.165320788088888e-06,
1063
+ "loss": 0.1988,
1064
+ "step": 14300
1065
+ },
1066
+ {
1067
+ "epoch": 6.55,
1068
+ "grad_norm": 0.06003904342651367,
1069
+ "learning_rate": 7.975179768721187e-06,
1070
+ "loss": 0.073,
1071
+ "step": 14400
1072
+ },
1073
+ {
1074
+ "epoch": 6.6,
1075
+ "grad_norm": 13.212935447692871,
1076
+ "learning_rate": 7.78647381639607e-06,
1077
+ "loss": 0.1858,
1078
+ "step": 14500
1079
+ },
1080
+ {
1081
+ "epoch": 6.64,
1082
+ "grad_norm": 0.03764009103178978,
1083
+ "learning_rate": 7.599241480953112e-06,
1084
+ "loss": 0.1373,
1085
+ "step": 14600
1086
+ },
1087
+ {
1088
+ "epoch": 6.69,
1089
+ "grad_norm": 0.01994572952389717,
1090
+ "learning_rate": 7.413521011193705e-06,
1091
+ "loss": 0.1165,
1092
+ "step": 14700
1093
+ },
1094
+ {
1095
+ "epoch": 6.73,
1096
+ "grad_norm": 36.3884162902832,
1097
+ "learning_rate": 7.229350347067426e-06,
1098
+ "loss": 0.1989,
1099
+ "step": 14800
1100
+ },
1101
+ {
1102
+ "epoch": 6.78,
1103
+ "grad_norm": 0.005731828045099974,
1104
+ "learning_rate": 7.046767111921425e-06,
1105
+ "loss": 0.0917,
1106
+ "step": 14900
1107
+ },
1108
+ {
1109
+ "epoch": 6.82,
1110
+ "grad_norm": 48.87177658081055,
1111
+ "learning_rate": 6.865808604814564e-06,
1112
+ "loss": 0.1263,
1113
+ "step": 15000
1114
+ },
1115
+ {
1116
+ "epoch": 6.87,
1117
+ "grad_norm": 29.0344295501709,
1118
+ "learning_rate": 6.686511792897767e-06,
1119
+ "loss": 0.1724,
1120
+ "step": 15100
1121
+ },
1122
+ {
1123
+ "epoch": 6.92,
1124
+ "grad_norm": 0.013815321959555149,
1125
+ "learning_rate": 6.508913303862144e-06,
1126
+ "loss": 0.1795,
1127
+ "step": 15200
1128
+ },
1129
+ {
1130
+ "epoch": 6.96,
1131
+ "grad_norm": 29.06551742553711,
1132
+ "learning_rate": 6.333049418456533e-06,
1133
+ "loss": 0.1383,
1134
+ "step": 15300
1135
+ },
1136
+ {
1137
+ "epoch": 7.0,
1138
+ "eval_accuracy": 0.9391650099403579,
1139
+ "eval_loss": 0.3318726718425751,
1140
+ "eval_runtime": 111.0425,
1141
+ "eval_samples_per_second": 22.649,
1142
+ "eval_steps_per_second": 2.837,
1143
+ "step": 15386
1144
+ },
1145
+ {
1146
+ "epoch": 7.01,
1147
+ "grad_norm": 54.77537536621094,
1148
+ "learning_rate": 6.1589560630758656e-06,
1149
+ "loss": 0.102,
1150
+ "step": 15400
1151
+ },
1152
+ {
1153
+ "epoch": 7.05,
1154
+ "grad_norm": 1.445946455001831,
1155
+ "learning_rate": 5.986668802421924e-06,
1156
+ "loss": 0.1061,
1157
+ "step": 15500
1158
+ },
1159
+ {
1160
+ "epoch": 7.1,
1161
+ "grad_norm": 30.50276756286621,
1162
+ "learning_rate": 5.8162228322380155e-06,
1163
+ "loss": 0.1004,
1164
+ "step": 15600
1165
+ },
1166
+ {
1167
+ "epoch": 7.14,
1168
+ "grad_norm": 0.28580981492996216,
1169
+ "learning_rate": 5.647652972118998e-06,
1170
+ "loss": 0.095,
1171
+ "step": 15700
1172
+ },
1173
+ {
1174
+ "epoch": 7.19,
1175
+ "grad_norm": 8.433425903320312,
1176
+ "learning_rate": 5.480993658398129e-06,
1177
+ "loss": 0.2191,
1178
+ "step": 15800
1179
+ },
1180
+ {
1181
+ "epoch": 7.23,
1182
+ "grad_norm": 23.46518898010254,
1183
+ "learning_rate": 5.316278937112267e-06,
1184
+ "loss": 0.1039,
1185
+ "step": 15900
1186
+ },
1187
+ {
1188
+ "epoch": 7.28,
1189
+ "grad_norm": 0.1998785138130188,
1190
+ "learning_rate": 5.153542457046737e-06,
1191
+ "loss": 0.1161,
1192
+ "step": 16000
1193
+ },
1194
+ {
1195
+ "epoch": 7.32,
1196
+ "grad_norm": 0.5174553990364075,
1197
+ "learning_rate": 4.992817462861397e-06,
1198
+ "loss": 0.1093,
1199
+ "step": 16100
1200
+ },
1201
+ {
1202
+ "epoch": 7.37,
1203
+ "grad_norm": 10.745183944702148,
1204
+ "learning_rate": 4.834136788299248e-06,
1205
+ "loss": 0.152,
1206
+ "step": 16200
1207
+ },
1208
+ {
1209
+ "epoch": 7.42,
1210
+ "grad_norm": 0.035963889211416245,
1211
+ "learning_rate": 4.67753284947898e-06,
1212
+ "loss": 0.0887,
1213
+ "step": 16300
1214
+ },
1215
+ {
1216
+ "epoch": 7.46,
1217
+ "grad_norm": 0.02713492140173912,
1218
+ "learning_rate": 4.523037638272822e-06,
1219
+ "loss": 0.0964,
1220
+ "step": 16400
1221
+ },
1222
+ {
1223
+ "epoch": 7.51,
1224
+ "grad_norm": 0.04027700796723366,
1225
+ "learning_rate": 4.370682715771108e-06,
1226
+ "loss": 0.0898,
1227
+ "step": 16500
1228
+ },
1229
+ {
1230
+ "epoch": 7.55,
1231
+ "grad_norm": 32.831809997558594,
1232
+ "learning_rate": 4.220499205834783e-06,
1233
+ "loss": 0.1376,
1234
+ "step": 16600
1235
+ },
1236
+ {
1237
+ "epoch": 7.6,
1238
+ "grad_norm": 1.195786952972412,
1239
+ "learning_rate": 4.072517788737264e-06,
1240
+ "loss": 0.1386,
1241
+ "step": 16700
1242
+ },
1243
+ {
1244
+ "epoch": 7.64,
1245
+ "grad_norm": 0.0985260009765625,
1246
+ "learning_rate": 3.926768694896931e-06,
1247
+ "loss": 0.1031,
1248
+ "step": 16800
1249
+ },
1250
+ {
1251
+ "epoch": 7.69,
1252
+ "grad_norm": 66.71458435058594,
1253
+ "learning_rate": 3.783281698701482e-06,
1254
+ "loss": 0.1584,
1255
+ "step": 16900
1256
+ },
1257
+ {
1258
+ "epoch": 7.73,
1259
+ "grad_norm": 0.06048920005559921,
1260
+ "learning_rate": 3.6420861124254607e-06,
1261
+ "loss": 0.1557,
1262
+ "step": 17000
1263
+ },
1264
+ {
1265
+ "epoch": 7.78,
1266
+ "grad_norm": 35.74193572998047,
1267
+ "learning_rate": 3.5032107802422107e-06,
1268
+ "loss": 0.1509,
1269
+ "step": 17100
1270
+ },
1271
+ {
1272
+ "epoch": 7.83,
1273
+ "grad_norm": 0.14910955727100372,
1274
+ "learning_rate": 3.3666840723314145e-06,
1275
+ "loss": 0.1421,
1276
+ "step": 17200
1277
+ },
1278
+ {
1279
+ "epoch": 7.87,
1280
+ "grad_norm": 44.823760986328125,
1281
+ "learning_rate": 3.232533879083511e-06,
1282
+ "loss": 0.1181,
1283
+ "step": 17300
1284
+ },
1285
+ {
1286
+ "epoch": 7.92,
1287
+ "grad_norm": 7.532151699066162,
1288
+ "learning_rate": 3.1007876054020724e-06,
1289
+ "loss": 0.1385,
1290
+ "step": 17400
1291
+ },
1292
+ {
1293
+ "epoch": 7.96,
1294
+ "grad_norm": 28.50664520263672,
1295
+ "learning_rate": 2.9714721651054e-06,
1296
+ "loss": 0.1222,
1297
+ "step": 17500
1298
+ },
1299
+ {
1300
+ "epoch": 8.0,
1301
+ "eval_accuracy": 0.9479125248508946,
1302
+ "eval_loss": 0.3132196068763733,
1303
+ "eval_runtime": 111.2173,
1304
+ "eval_samples_per_second": 22.613,
1305
+ "eval_steps_per_second": 2.832,
1306
+ "step": 17584
1307
+ },
1308
+ {
1309
+ "epoch": 8.01,
1310
+ "grad_norm": 0.00245882966555655,
1311
+ "learning_rate": 2.8446139754284486e-06,
1312
+ "loss": 0.0782,
1313
+ "step": 17600
1314
+ },
1315
+ {
1316
+ "epoch": 8.05,
1317
+ "grad_norm": 0.3096585273742676,
1318
+ "learning_rate": 2.7202389516261346e-06,
1319
+ "loss": 0.1482,
1320
+ "step": 17700
1321
+ },
1322
+ {
1323
+ "epoch": 8.1,
1324
+ "grad_norm": 38.84019088745117,
1325
+ "learning_rate": 2.5983725016792574e-06,
1326
+ "loss": 0.1191,
1327
+ "step": 17800
1328
+ },
1329
+ {
1330
+ "epoch": 8.14,
1331
+ "grad_norm": 0.24393437802791595,
1332
+ "learning_rate": 2.4790395211040296e-06,
1333
+ "loss": 0.1095,
1334
+ "step": 17900
1335
+ },
1336
+ {
1337
+ "epoch": 8.19,
1338
+ "grad_norm": 0.32781603932380676,
1339
+ "learning_rate": 2.36226438786627e-06,
1340
+ "loss": 0.121,
1341
+ "step": 18000
1342
+ },
1343
+ {
1344
+ "epoch": 8.23,
1345
+ "grad_norm": 8.179335594177246,
1346
+ "learning_rate": 2.2480709574013637e-06,
1347
+ "loss": 0.0564,
1348
+ "step": 18100
1349
+ },
1350
+ {
1351
+ "epoch": 8.28,
1352
+ "grad_norm": 0.03705134242773056,
1353
+ "learning_rate": 2.1364825577409424e-06,
1354
+ "loss": 0.1466,
1355
+ "step": 18200
1356
+ },
1357
+ {
1358
+ "epoch": 8.33,
1359
+ "grad_norm": 0.8985774517059326,
1360
+ "learning_rate": 2.0275219847473026e-06,
1361
+ "loss": 0.1097,
1362
+ "step": 18300
1363
+ },
1364
+ {
1365
+ "epoch": 8.37,
1366
+ "grad_norm": 7.916272163391113,
1367
+ "learning_rate": 1.9212114974565664e-06,
1368
+ "loss": 0.1213,
1369
+ "step": 18400
1370
+ },
1371
+ {
1372
+ "epoch": 8.42,
1373
+ "grad_norm": 0.16036684811115265,
1374
+ "learning_rate": 1.8175728135314707e-06,
1375
+ "loss": 0.0848,
1376
+ "step": 18500
1377
+ },
1378
+ {
1379
+ "epoch": 8.46,
1380
+ "grad_norm": 0.01501951552927494,
1381
+ "learning_rate": 1.7166271048247796e-06,
1382
+ "loss": 0.089,
1383
+ "step": 18600
1384
+ },
1385
+ {
1386
+ "epoch": 8.51,
1387
+ "grad_norm": 80.24330139160156,
1388
+ "learning_rate": 1.6183949930541898e-06,
1389
+ "loss": 0.1014,
1390
+ "step": 18700
1391
+ },
1392
+ {
1393
+ "epoch": 8.55,
1394
+ "grad_norm": 0.014790402725338936,
1395
+ "learning_rate": 1.5228965455896054e-06,
1396
+ "loss": 0.1042,
1397
+ "step": 18800
1398
+ },
1399
+ {
1400
+ "epoch": 8.6,
1401
+ "grad_norm": 0.005228283815085888,
1402
+ "learning_rate": 1.4301512713536873e-06,
1403
+ "loss": 0.1056,
1404
+ "step": 18900
1405
+ },
1406
+ {
1407
+ "epoch": 8.64,
1408
+ "grad_norm": 0.0038773128762841225,
1409
+ "learning_rate": 1.3401781168364591e-06,
1410
+ "loss": 0.0963,
1411
+ "step": 19000
1412
+ },
1413
+ {
1414
+ "epoch": 8.69,
1415
+ "grad_norm": 0.009148034267127514,
1416
+ "learning_rate": 1.2529954622248114e-06,
1417
+ "loss": 0.109,
1418
+ "step": 19100
1419
+ },
1420
+ {
1421
+ "epoch": 8.74,
1422
+ "grad_norm": 0.10844717919826508,
1423
+ "learning_rate": 1.1686211176477208e-06,
1424
+ "loss": 0.1075,
1425
+ "step": 19200
1426
+ },
1427
+ {
1428
+ "epoch": 8.78,
1429
+ "grad_norm": 0.017569424584507942,
1430
+ "learning_rate": 1.0870723195378852e-06,
1431
+ "loss": 0.1228,
1432
+ "step": 19300
1433
+ },
1434
+ {
1435
+ "epoch": 8.83,
1436
+ "grad_norm": 39.32170104980469,
1437
+ "learning_rate": 1.00836572711058e-06,
1438
+ "loss": 0.1414,
1439
+ "step": 19400
1440
+ },
1441
+ {
1442
+ "epoch": 8.87,
1443
+ "grad_norm": 28.396100997924805,
1444
+ "learning_rate": 9.325174189604346e-07,
1445
+ "loss": 0.129,
1446
+ "step": 19500
1447
+ },
1448
+ {
1449
+ "epoch": 8.92,
1450
+ "grad_norm": 0.031751640141010284,
1451
+ "learning_rate": 8.595428897768071e-07,
1452
+ "loss": 0.1163,
1453
+ "step": 19600
1454
+ },
1455
+ {
1456
+ "epoch": 8.96,
1457
+ "grad_norm": 1.8334925174713135,
1458
+ "learning_rate": 7.894570471784418e-07,
1459
+ "loss": 0.1196,
1460
+ "step": 19700
1461
+ },
1462
+ {
1463
+ "epoch": 9.0,
1464
+ "eval_accuracy": 0.9467196819085487,
1465
+ "eval_loss": 0.3136024475097656,
1466
+ "eval_runtime": 111.3615,
1467
+ "eval_samples_per_second": 22.584,
1468
+ "eval_steps_per_second": 2.829,
1469
+ "step": 19782
1470
+ },
1471
+ {
1472
+ "epoch": 9.01,
1473
+ "grad_norm": 9.599562644958496,
1474
+ "learning_rate": 7.222742086680756e-07,
1475
+ "loss": 0.1225,
1476
+ "step": 19800
1477
+ },
1478
+ {
1479
+ "epoch": 9.05,
1480
+ "grad_norm": 20.813501358032227,
1481
+ "learning_rate": 6.580080987075721e-07,
1482
+ "loss": 0.0845,
1483
+ "step": 19900
1484
+ },
1485
+ {
1486
+ "epoch": 9.1,
1487
+ "grad_norm": 0.00961952656507492,
1488
+ "learning_rate": 5.966718459142196e-07,
1489
+ "loss": 0.107,
1490
+ "step": 20000
1491
+ },
1492
+ {
1493
+ "epoch": 9.14,
1494
+ "grad_norm": 0.0361330471932888,
1495
+ "learning_rate": 5.382779803787579e-07,
1496
+ "loss": 0.0993,
1497
+ "step": 20100
1498
+ },
1499
+ {
1500
+ "epoch": 9.19,
1501
+ "grad_norm": 0.8669134378433228,
1502
+ "learning_rate": 4.82838431105655e-07,
1503
+ "loss": 0.0874,
1504
+ "step": 20200
1505
+ },
1506
+ {
1507
+ "epoch": 9.24,
1508
+ "grad_norm": 0.02739112637937069,
1509
+ "learning_rate": 4.303645235761866e-07,
1510
+ "loss": 0.1095,
1511
+ "step": 20300
1512
+ },
1513
+ {
1514
+ "epoch": 9.28,
1515
+ "grad_norm": 0.019733713939785957,
1516
+ "learning_rate": 3.808669774348167e-07,
1517
+ "loss": 0.1325,
1518
+ "step": 20400
1519
+ },
1520
+ {
1521
+ "epoch": 9.33,
1522
+ "grad_norm": 0.0024758102372288704,
1523
+ "learning_rate": 3.3435590429932493e-07,
1524
+ "loss": 0.0883,
1525
+ "step": 20500
1526
+ },
1527
+ {
1528
+ "epoch": 9.37,
1529
+ "grad_norm": 0.04354145750403404,
1530
+ "learning_rate": 2.908408056951578e-07,
1531
+ "loss": 0.0939,
1532
+ "step": 20600
1533
+ },
1534
+ {
1535
+ "epoch": 9.42,
1536
+ "grad_norm": 10.014009475708008,
1537
+ "learning_rate": 2.5033057111440106e-07,
1538
+ "loss": 0.0856,
1539
+ "step": 20700
1540
+ },
1541
+ {
1542
+ "epoch": 9.46,
1543
+ "grad_norm": 0.00877867080271244,
1544
+ "learning_rate": 2.1283347619979243e-07,
1545
+ "loss": 0.0977,
1546
+ "step": 20800
1547
+ },
1548
+ {
1549
+ "epoch": 9.51,
1550
+ "grad_norm": 0.0024496885016560555,
1551
+ "learning_rate": 1.7835718105413235e-07,
1552
+ "loss": 0.1331,
1553
+ "step": 20900
1554
+ },
1555
+ {
1556
+ "epoch": 9.55,
1557
+ "grad_norm": 0.2954551577568054,
1558
+ "learning_rate": 1.4690872867542892e-07,
1559
+ "loss": 0.126,
1560
+ "step": 21000
1561
+ },
1562
+ {
1563
+ "epoch": 9.6,
1564
+ "grad_norm": 0.06035974249243736,
1565
+ "learning_rate": 1.1849454351812394e-07,
1566
+ "loss": 0.077,
1567
+ "step": 21100
1568
+ },
1569
+ {
1570
+ "epoch": 9.65,
1571
+ "grad_norm": 0.7805526852607727,
1572
+ "learning_rate": 9.312043018067762e-08,
1573
+ "loss": 0.1167,
1574
+ "step": 21200
1575
+ },
1576
+ {
1577
+ "epoch": 9.69,
1578
+ "grad_norm": 0.09065116941928864,
1579
+ "learning_rate": 7.079157221975718e-08,
1580
+ "loss": 0.1658,
1581
+ "step": 21300
1582
+ },
1583
+ {
1584
+ "epoch": 9.74,
1585
+ "grad_norm": 0.00335172307677567,
1586
+ "learning_rate": 5.1512531091333914e-08,
1587
+ "loss": 0.126,
1588
+ "step": 21400
1589
+ },
1590
+ {
1591
+ "epoch": 9.78,
1592
+ "grad_norm": 1.6308414936065674,
1593
+ "learning_rate": 3.528724521882687e-08,
1594
+ "loss": 0.1088,
1595
+ "step": 21500
1596
+ },
1597
+ {
1598
+ "epoch": 9.83,
1599
+ "grad_norm": 22.293624877929688,
1600
+ "learning_rate": 2.211902918855313e-08,
1601
+ "loss": 0.1201,
1602
+ "step": 21600
1603
+ },
1604
+ {
1605
+ "epoch": 9.87,
1606
+ "grad_norm": 0.030308537185192108,
1607
+ "learning_rate": 1.2010573072602783e-08,
1608
+ "loss": 0.0758,
1609
+ "step": 21700
1610
+ },
1611
+ {
1612
+ "epoch": 9.92,
1613
+ "grad_norm": 5.30928897857666,
1614
+ "learning_rate": 4.963941879295164e-09,
1615
+ "loss": 0.0952,
1616
+ "step": 21800
1617
+ },
1618
+ {
1619
+ "epoch": 9.96,
1620
+ "grad_norm": 0.04648282751441002,
1621
+ "learning_rate": 9.805751313296529e-10,
1622
+ "loss": 0.1257,
1623
+ "step": 21900
1624
+ },
1625
+ {
1626
+ "epoch": 10.0,
1627
+ "eval_accuracy": 0.94831013916501,
1628
+ "eval_loss": 0.3119918704032898,
1629
+ "eval_runtime": 111.4853,
1630
+ "eval_samples_per_second": 22.559,
1631
+ "eval_steps_per_second": 2.825,
1632
+ "step": 21980
1633
+ },
1634
+ {
1635
+ "epoch": 10.0,
1636
+ "step": 21980,
1637
+ "total_flos": 4.09349935387607e+19,
1638
+ "train_loss": 0.2596938943081926,
1639
+ "train_runtime": 18130.6882,
1640
+ "train_samples_per_second": 9.697,
1641
+ "train_steps_per_second": 1.212
1642
+ }
1643
+ ],
1644
+ "logging_steps": 100,
1645
+ "max_steps": 21980,
1646
+ "num_input_tokens_seen": 0,
1647
+ "num_train_epochs": 10,
1648
+ "save_steps": 500,
1649
+ "total_flos": 4.09349935387607e+19,
1650
+ "train_batch_size": 8,
1651
+ "trial_name": null,
1652
+ "trial_params": null
1653
+ }