nsugianto commited on
Commit
2611329
1 Parent(s): 5f21dc3

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: nsugianto/vit-base-lcdoctypev1_session2
4
  tags:
 
5
  - generated_from_trainer
6
  datasets:
7
  - imagefolder
@@ -14,7 +15,7 @@ model-index:
14
  name: Image Classification
15
  type: image-classification
16
  dataset:
17
- name: imagefolder
18
  type: imagefolder
19
  config: default
20
  split: validation
@@ -30,9 +31,9 @@ should probably proofread and complete it, then remove this comment. -->
30
 
31
  # vit-base-lcdoctypev1_session3
32
 
33
- This model is a fine-tuned version of [nsugianto/vit-base-lcdoctypev1_session2](https://huggingface.co/nsugianto/vit-base-lcdoctypev1_session2) on the imagefolder dataset.
34
  It achieves the following results on the evaluation set:
35
- - Loss: 0.1338
36
  - Accuracy: 0.9669
37
 
38
  ## Model description
 
2
  license: apache-2.0
3
  base_model: nsugianto/vit-base-lcdoctypev1_session2
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  datasets:
8
  - imagefolder
 
15
  name: Image Classification
16
  type: image-classification
17
  dataset:
18
+ name: doctype_v1
19
  type: imagefolder
20
  config: default
21
  split: validation
 
31
 
32
  # vit-base-lcdoctypev1_session3
33
 
34
+ This model is a fine-tuned version of [nsugianto/vit-base-lcdoctypev1_session2](https://huggingface.co/nsugianto/vit-base-lcdoctypev1_session2) on the doctype_v1 dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.1050
37
  - Accuracy: 0.9669
38
 
39
  ## Model description
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9669421487603306,
4
+ "eval_loss": 0.10501641035079956,
5
+ "eval_runtime": 8.768,
6
+ "eval_samples_per_second": 13.8,
7
+ "eval_steps_per_second": 1.825,
8
+ "total_flos": 7.39286832673751e+17,
9
+ "train_loss": 0.06743512197087208,
10
+ "train_runtime": 3546.8991,
11
+ "train_samples_per_second": 2.69,
12
+ "train_steps_per_second": 0.169
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9669421487603306,
4
+ "eval_loss": 0.10501641035079956,
5
+ "eval_runtime": 8.768,
6
+ "eval_samples_per_second": 13.8,
7
+ "eval_steps_per_second": 1.825
8
+ }
runs/Mar29_00-56-56_5b2ae90c8915/events.out.tfevents.1711677406.5b2ae90c8915.192.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da8b2c7ff7f30718159307c053a596ae10df2fb1033e13652514c3c4b848b77e
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 7.39286832673751e+17,
4
+ "train_loss": 0.06743512197087208,
5
+ "train_runtime": 3546.8991,
6
+ "train_samples_per_second": 2.69,
7
+ "train_steps_per_second": 0.169
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.10501641035079956,
3
+ "best_model_checkpoint": "./vit-base-lcdoctypev1_session3/checkpoint-335",
4
+ "epoch": 10.0,
5
+ "eval_steps": 5,
6
+ "global_step": 600,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.08,
13
+ "eval_accuracy": 0.9090909090909091,
14
+ "eval_loss": 0.3158922493457794,
15
+ "eval_runtime": 9.7781,
16
+ "eval_samples_per_second": 12.375,
17
+ "eval_steps_per_second": 1.636,
18
+ "step": 5
19
+ },
20
+ {
21
+ "epoch": 0.17,
22
+ "grad_norm": 6.890113353729248,
23
+ "learning_rate": 0.00019666666666666666,
24
+ "loss": 0.1798,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.17,
29
+ "eval_accuracy": 0.9338842975206612,
30
+ "eval_loss": 0.22616656124591827,
31
+ "eval_runtime": 8.5759,
32
+ "eval_samples_per_second": 14.109,
33
+ "eval_steps_per_second": 1.866,
34
+ "step": 10
35
+ },
36
+ {
37
+ "epoch": 0.25,
38
+ "eval_accuracy": 0.7768595041322314,
39
+ "eval_loss": 0.9910252690315247,
40
+ "eval_runtime": 9.2477,
41
+ "eval_samples_per_second": 13.084,
42
+ "eval_steps_per_second": 1.73,
43
+ "step": 15
44
+ },
45
+ {
46
+ "epoch": 0.33,
47
+ "grad_norm": 5.934413909912109,
48
+ "learning_rate": 0.0001936666666666667,
49
+ "loss": 0.3815,
50
+ "step": 20
51
+ },
52
+ {
53
+ "epoch": 0.33,
54
+ "eval_accuracy": 0.9008264462809917,
55
+ "eval_loss": 0.3035498857498169,
56
+ "eval_runtime": 8.7721,
57
+ "eval_samples_per_second": 13.794,
58
+ "eval_steps_per_second": 1.824,
59
+ "step": 20
60
+ },
61
+ {
62
+ "epoch": 0.42,
63
+ "eval_accuracy": 0.9338842975206612,
64
+ "eval_loss": 0.21773552894592285,
65
+ "eval_runtime": 9.0833,
66
+ "eval_samples_per_second": 13.321,
67
+ "eval_steps_per_second": 1.761,
68
+ "step": 25
69
+ },
70
+ {
71
+ "epoch": 0.5,
72
+ "grad_norm": 0.3238607347011566,
73
+ "learning_rate": 0.00019033333333333334,
74
+ "loss": 0.1429,
75
+ "step": 30
76
+ },
77
+ {
78
+ "epoch": 0.5,
79
+ "eval_accuracy": 0.8842975206611571,
80
+ "eval_loss": 0.4908874034881592,
81
+ "eval_runtime": 8.456,
82
+ "eval_samples_per_second": 14.309,
83
+ "eval_steps_per_second": 1.892,
84
+ "step": 30
85
+ },
86
+ {
87
+ "epoch": 0.58,
88
+ "eval_accuracy": 0.9256198347107438,
89
+ "eval_loss": 0.3096350133419037,
90
+ "eval_runtime": 10.0423,
91
+ "eval_samples_per_second": 12.049,
92
+ "eval_steps_per_second": 1.593,
93
+ "step": 35
94
+ },
95
+ {
96
+ "epoch": 0.67,
97
+ "grad_norm": 5.55199670791626,
98
+ "learning_rate": 0.00018700000000000002,
99
+ "loss": 0.2424,
100
+ "step": 40
101
+ },
102
+ {
103
+ "epoch": 0.67,
104
+ "eval_accuracy": 0.9173553719008265,
105
+ "eval_loss": 0.32702526450157166,
106
+ "eval_runtime": 8.357,
107
+ "eval_samples_per_second": 14.479,
108
+ "eval_steps_per_second": 1.915,
109
+ "step": 40
110
+ },
111
+ {
112
+ "epoch": 0.75,
113
+ "eval_accuracy": 0.9173553719008265,
114
+ "eval_loss": 0.2554916441440582,
115
+ "eval_runtime": 9.2174,
116
+ "eval_samples_per_second": 13.127,
117
+ "eval_steps_per_second": 1.736,
118
+ "step": 45
119
+ },
120
+ {
121
+ "epoch": 0.83,
122
+ "grad_norm": 14.463412284851074,
123
+ "learning_rate": 0.00018366666666666667,
124
+ "loss": 0.1172,
125
+ "step": 50
126
+ },
127
+ {
128
+ "epoch": 0.83,
129
+ "eval_accuracy": 0.9173553719008265,
130
+ "eval_loss": 0.2309066504240036,
131
+ "eval_runtime": 9.1271,
132
+ "eval_samples_per_second": 13.257,
133
+ "eval_steps_per_second": 1.753,
134
+ "step": 50
135
+ },
136
+ {
137
+ "epoch": 0.92,
138
+ "eval_accuracy": 0.9173553719008265,
139
+ "eval_loss": 0.2952496409416199,
140
+ "eval_runtime": 9.6618,
141
+ "eval_samples_per_second": 12.523,
142
+ "eval_steps_per_second": 1.656,
143
+ "step": 55
144
+ },
145
+ {
146
+ "epoch": 1.0,
147
+ "grad_norm": 0.9041020274162292,
148
+ "learning_rate": 0.00018033333333333334,
149
+ "loss": 0.1185,
150
+ "step": 60
151
+ },
152
+ {
153
+ "epoch": 1.0,
154
+ "eval_accuracy": 0.9173553719008265,
155
+ "eval_loss": 0.295705646276474,
156
+ "eval_runtime": 8.6323,
157
+ "eval_samples_per_second": 14.017,
158
+ "eval_steps_per_second": 1.854,
159
+ "step": 60
160
+ },
161
+ {
162
+ "epoch": 1.08,
163
+ "eval_accuracy": 0.8925619834710744,
164
+ "eval_loss": 0.3724129796028137,
165
+ "eval_runtime": 8.5924,
166
+ "eval_samples_per_second": 14.082,
167
+ "eval_steps_per_second": 1.862,
168
+ "step": 65
169
+ },
170
+ {
171
+ "epoch": 1.17,
172
+ "grad_norm": 24.987598419189453,
173
+ "learning_rate": 0.00017700000000000002,
174
+ "loss": 0.1594,
175
+ "step": 70
176
+ },
177
+ {
178
+ "epoch": 1.17,
179
+ "eval_accuracy": 0.8842975206611571,
180
+ "eval_loss": 0.4216250777244568,
181
+ "eval_runtime": 9.0834,
182
+ "eval_samples_per_second": 13.321,
183
+ "eval_steps_per_second": 1.761,
184
+ "step": 70
185
+ },
186
+ {
187
+ "epoch": 1.25,
188
+ "eval_accuracy": 0.9173553719008265,
189
+ "eval_loss": 0.347516268491745,
190
+ "eval_runtime": 8.9447,
191
+ "eval_samples_per_second": 13.528,
192
+ "eval_steps_per_second": 1.789,
193
+ "step": 75
194
+ },
195
+ {
196
+ "epoch": 1.33,
197
+ "grad_norm": 0.3783435523509979,
198
+ "learning_rate": 0.00017366666666666667,
199
+ "loss": 0.1231,
200
+ "step": 80
201
+ },
202
+ {
203
+ "epoch": 1.33,
204
+ "eval_accuracy": 0.8925619834710744,
205
+ "eval_loss": 0.323406845331192,
206
+ "eval_runtime": 9.122,
207
+ "eval_samples_per_second": 13.265,
208
+ "eval_steps_per_second": 1.754,
209
+ "step": 80
210
+ },
211
+ {
212
+ "epoch": 1.42,
213
+ "eval_accuracy": 0.8842975206611571,
214
+ "eval_loss": 0.4309641718864441,
215
+ "eval_runtime": 8.8847,
216
+ "eval_samples_per_second": 13.619,
217
+ "eval_steps_per_second": 1.801,
218
+ "step": 85
219
+ },
220
+ {
221
+ "epoch": 1.5,
222
+ "grad_norm": 5.4656147956848145,
223
+ "learning_rate": 0.00017033333333333334,
224
+ "loss": 0.0875,
225
+ "step": 90
226
+ },
227
+ {
228
+ "epoch": 1.5,
229
+ "eval_accuracy": 0.9256198347107438,
230
+ "eval_loss": 0.3598105013370514,
231
+ "eval_runtime": 9.086,
232
+ "eval_samples_per_second": 13.317,
233
+ "eval_steps_per_second": 1.761,
234
+ "step": 90
235
+ },
236
+ {
237
+ "epoch": 1.58,
238
+ "eval_accuracy": 0.9256198347107438,
239
+ "eval_loss": 0.3038037419319153,
240
+ "eval_runtime": 8.8538,
241
+ "eval_samples_per_second": 13.666,
242
+ "eval_steps_per_second": 1.807,
243
+ "step": 95
244
+ },
245
+ {
246
+ "epoch": 1.67,
247
+ "grad_norm": 0.0762423649430275,
248
+ "learning_rate": 0.000167,
249
+ "loss": 0.0897,
250
+ "step": 100
251
+ },
252
+ {
253
+ "epoch": 1.67,
254
+ "eval_accuracy": 0.9338842975206612,
255
+ "eval_loss": 0.25987809896469116,
256
+ "eval_runtime": 8.7997,
257
+ "eval_samples_per_second": 13.751,
258
+ "eval_steps_per_second": 1.818,
259
+ "step": 100
260
+ },
261
+ {
262
+ "epoch": 1.75,
263
+ "eval_accuracy": 0.9586776859504132,
264
+ "eval_loss": 0.1683536171913147,
265
+ "eval_runtime": 8.0037,
266
+ "eval_samples_per_second": 15.118,
267
+ "eval_steps_per_second": 1.999,
268
+ "step": 105
269
+ },
270
+ {
271
+ "epoch": 1.83,
272
+ "grad_norm": 1.4948017597198486,
273
+ "learning_rate": 0.00016366666666666667,
274
+ "loss": 0.1797,
275
+ "step": 110
276
+ },
277
+ {
278
+ "epoch": 1.83,
279
+ "eval_accuracy": 0.9504132231404959,
280
+ "eval_loss": 0.1412244588136673,
281
+ "eval_runtime": 8.3997,
282
+ "eval_samples_per_second": 14.405,
283
+ "eval_steps_per_second": 1.905,
284
+ "step": 110
285
+ },
286
+ {
287
+ "epoch": 1.92,
288
+ "eval_accuracy": 0.9586776859504132,
289
+ "eval_loss": 0.14531350135803223,
290
+ "eval_runtime": 8.6407,
291
+ "eval_samples_per_second": 14.003,
292
+ "eval_steps_per_second": 1.852,
293
+ "step": 115
294
+ },
295
+ {
296
+ "epoch": 2.0,
297
+ "grad_norm": 6.408501148223877,
298
+ "learning_rate": 0.00016033333333333335,
299
+ "loss": 0.1178,
300
+ "step": 120
301
+ },
302
+ {
303
+ "epoch": 2.0,
304
+ "eval_accuracy": 0.8925619834710744,
305
+ "eval_loss": 0.3830728530883789,
306
+ "eval_runtime": 9.0064,
307
+ "eval_samples_per_second": 13.435,
308
+ "eval_steps_per_second": 1.777,
309
+ "step": 120
310
+ },
311
+ {
312
+ "epoch": 2.08,
313
+ "eval_accuracy": 0.9090909090909091,
314
+ "eval_loss": 0.3321413993835449,
315
+ "eval_runtime": 8.8462,
316
+ "eval_samples_per_second": 13.678,
317
+ "eval_steps_per_second": 1.809,
318
+ "step": 125
319
+ },
320
+ {
321
+ "epoch": 2.17,
322
+ "grad_norm": 0.7101590633392334,
323
+ "learning_rate": 0.00015700000000000002,
324
+ "loss": 0.1969,
325
+ "step": 130
326
+ },
327
+ {
328
+ "epoch": 2.17,
329
+ "eval_accuracy": 0.9090909090909091,
330
+ "eval_loss": 0.25461918115615845,
331
+ "eval_runtime": 8.969,
332
+ "eval_samples_per_second": 13.491,
333
+ "eval_steps_per_second": 1.784,
334
+ "step": 130
335
+ },
336
+ {
337
+ "epoch": 2.25,
338
+ "eval_accuracy": 0.9504132231404959,
339
+ "eval_loss": 0.18391890823841095,
340
+ "eval_runtime": 8.644,
341
+ "eval_samples_per_second": 13.998,
342
+ "eval_steps_per_second": 1.851,
343
+ "step": 135
344
+ },
345
+ {
346
+ "epoch": 2.33,
347
+ "grad_norm": 0.9350224733352661,
348
+ "learning_rate": 0.00015366666666666667,
349
+ "loss": 0.0362,
350
+ "step": 140
351
+ },
352
+ {
353
+ "epoch": 2.33,
354
+ "eval_accuracy": 0.9586776859504132,
355
+ "eval_loss": 0.20266053080558777,
356
+ "eval_runtime": 8.5546,
357
+ "eval_samples_per_second": 14.144,
358
+ "eval_steps_per_second": 1.87,
359
+ "step": 140
360
+ },
361
+ {
362
+ "epoch": 2.42,
363
+ "eval_accuracy": 0.9090909090909091,
364
+ "eval_loss": 0.28766530752182007,
365
+ "eval_runtime": 8.0124,
366
+ "eval_samples_per_second": 15.102,
367
+ "eval_steps_per_second": 1.997,
368
+ "step": 145
369
+ },
370
+ {
371
+ "epoch": 2.5,
372
+ "grad_norm": 8.35146427154541,
373
+ "learning_rate": 0.00015033333333333335,
374
+ "loss": 0.1047,
375
+ "step": 150
376
+ },
377
+ {
378
+ "epoch": 2.5,
379
+ "eval_accuracy": 0.8925619834710744,
380
+ "eval_loss": 0.4503512978553772,
381
+ "eval_runtime": 8.454,
382
+ "eval_samples_per_second": 14.313,
383
+ "eval_steps_per_second": 1.893,
384
+ "step": 150
385
+ },
386
+ {
387
+ "epoch": 2.58,
388
+ "eval_accuracy": 0.9504132231404959,
389
+ "eval_loss": 0.1810603141784668,
390
+ "eval_runtime": 9.2694,
391
+ "eval_samples_per_second": 13.054,
392
+ "eval_steps_per_second": 1.726,
393
+ "step": 155
394
+ },
395
+ {
396
+ "epoch": 2.67,
397
+ "grad_norm": 0.6059785485267639,
398
+ "learning_rate": 0.000147,
399
+ "loss": 0.1232,
400
+ "step": 160
401
+ },
402
+ {
403
+ "epoch": 2.67,
404
+ "eval_accuracy": 0.9421487603305785,
405
+ "eval_loss": 0.21074515581130981,
406
+ "eval_runtime": 8.8489,
407
+ "eval_samples_per_second": 13.674,
408
+ "eval_steps_per_second": 1.808,
409
+ "step": 160
410
+ },
411
+ {
412
+ "epoch": 2.75,
413
+ "eval_accuracy": 0.9504132231404959,
414
+ "eval_loss": 0.20863419771194458,
415
+ "eval_runtime": 8.9684,
416
+ "eval_samples_per_second": 13.492,
417
+ "eval_steps_per_second": 1.784,
418
+ "step": 165
419
+ },
420
+ {
421
+ "epoch": 2.83,
422
+ "grad_norm": 0.04269757494330406,
423
+ "learning_rate": 0.00014366666666666667,
424
+ "loss": 0.0611,
425
+ "step": 170
426
+ },
427
+ {
428
+ "epoch": 2.83,
429
+ "eval_accuracy": 0.9338842975206612,
430
+ "eval_loss": 0.2971450686454773,
431
+ "eval_runtime": 9.1231,
432
+ "eval_samples_per_second": 13.263,
433
+ "eval_steps_per_second": 1.754,
434
+ "step": 170
435
+ },
436
+ {
437
+ "epoch": 2.92,
438
+ "eval_accuracy": 0.9338842975206612,
439
+ "eval_loss": 0.2731765806674957,
440
+ "eval_runtime": 8.9974,
441
+ "eval_samples_per_second": 13.448,
442
+ "eval_steps_per_second": 1.778,
443
+ "step": 175
444
+ },
445
+ {
446
+ "epoch": 3.0,
447
+ "grad_norm": 0.9442762732505798,
448
+ "learning_rate": 0.00014033333333333335,
449
+ "loss": 0.0815,
450
+ "step": 180
451
+ },
452
+ {
453
+ "epoch": 3.0,
454
+ "eval_accuracy": 0.9586776859504132,
455
+ "eval_loss": 0.16794723272323608,
456
+ "eval_runtime": 8.9455,
457
+ "eval_samples_per_second": 13.526,
458
+ "eval_steps_per_second": 1.789,
459
+ "step": 180
460
+ },
461
+ {
462
+ "epoch": 3.08,
463
+ "eval_accuracy": 0.9338842975206612,
464
+ "eval_loss": 0.24155646562576294,
465
+ "eval_runtime": 8.3613,
466
+ "eval_samples_per_second": 14.471,
467
+ "eval_steps_per_second": 1.914,
468
+ "step": 185
469
+ },
470
+ {
471
+ "epoch": 3.17,
472
+ "grad_norm": 2.08608078956604,
473
+ "learning_rate": 0.00013700000000000002,
474
+ "loss": 0.0469,
475
+ "step": 190
476
+ },
477
+ {
478
+ "epoch": 3.17,
479
+ "eval_accuracy": 0.9256198347107438,
480
+ "eval_loss": 0.29269692301750183,
481
+ "eval_runtime": 8.9407,
482
+ "eval_samples_per_second": 13.534,
483
+ "eval_steps_per_second": 1.79,
484
+ "step": 190
485
+ },
486
+ {
487
+ "epoch": 3.25,
488
+ "eval_accuracy": 0.9338842975206612,
489
+ "eval_loss": 0.28314581513404846,
490
+ "eval_runtime": 8.7918,
491
+ "eval_samples_per_second": 13.763,
492
+ "eval_steps_per_second": 1.82,
493
+ "step": 195
494
+ },
495
+ {
496
+ "epoch": 3.33,
497
+ "grad_norm": 0.8798501491546631,
498
+ "learning_rate": 0.00013366666666666667,
499
+ "loss": 0.0443,
500
+ "step": 200
501
+ },
502
+ {
503
+ "epoch": 3.33,
504
+ "eval_accuracy": 0.9421487603305785,
505
+ "eval_loss": 0.2744951546192169,
506
+ "eval_runtime": 8.612,
507
+ "eval_samples_per_second": 14.05,
508
+ "eval_steps_per_second": 1.858,
509
+ "step": 200
510
+ },
511
+ {
512
+ "epoch": 3.42,
513
+ "eval_accuracy": 0.8925619834710744,
514
+ "eval_loss": 0.4193201959133148,
515
+ "eval_runtime": 8.9147,
516
+ "eval_samples_per_second": 13.573,
517
+ "eval_steps_per_second": 1.795,
518
+ "step": 205
519
+ },
520
+ {
521
+ "epoch": 3.5,
522
+ "grad_norm": 0.03738969564437866,
523
+ "learning_rate": 0.00013033333333333332,
524
+ "loss": 0.0823,
525
+ "step": 210
526
+ },
527
+ {
528
+ "epoch": 3.5,
529
+ "eval_accuracy": 0.9173553719008265,
530
+ "eval_loss": 0.3746081292629242,
531
+ "eval_runtime": 8.5854,
532
+ "eval_samples_per_second": 14.094,
533
+ "eval_steps_per_second": 1.864,
534
+ "step": 210
535
+ },
536
+ {
537
+ "epoch": 3.58,
538
+ "eval_accuracy": 0.9421487603305785,
539
+ "eval_loss": 0.30296453833580017,
540
+ "eval_runtime": 8.8651,
541
+ "eval_samples_per_second": 13.649,
542
+ "eval_steps_per_second": 1.805,
543
+ "step": 215
544
+ },
545
+ {
546
+ "epoch": 3.67,
547
+ "grad_norm": 0.03203318268060684,
548
+ "learning_rate": 0.000127,
549
+ "loss": 0.0101,
550
+ "step": 220
551
+ },
552
+ {
553
+ "epoch": 3.67,
554
+ "eval_accuracy": 0.9504132231404959,
555
+ "eval_loss": 0.21464580297470093,
556
+ "eval_runtime": 8.8029,
557
+ "eval_samples_per_second": 13.745,
558
+ "eval_steps_per_second": 1.818,
559
+ "step": 220
560
+ },
561
+ {
562
+ "epoch": 3.75,
563
+ "eval_accuracy": 0.9421487603305785,
564
+ "eval_loss": 0.2514008581638336,
565
+ "eval_runtime": 9.2073,
566
+ "eval_samples_per_second": 13.142,
567
+ "eval_steps_per_second": 1.738,
568
+ "step": 225
569
+ },
570
+ {
571
+ "epoch": 3.83,
572
+ "grad_norm": 0.04610053077340126,
573
+ "learning_rate": 0.00012366666666666667,
574
+ "loss": 0.16,
575
+ "step": 230
576
+ },
577
+ {
578
+ "epoch": 3.83,
579
+ "eval_accuracy": 0.9421487603305785,
580
+ "eval_loss": 0.25517505407333374,
581
+ "eval_runtime": 8.8885,
582
+ "eval_samples_per_second": 13.613,
583
+ "eval_steps_per_second": 1.8,
584
+ "step": 230
585
+ },
586
+ {
587
+ "epoch": 3.92,
588
+ "eval_accuracy": 0.9421487603305785,
589
+ "eval_loss": 0.22389596700668335,
590
+ "eval_runtime": 8.5203,
591
+ "eval_samples_per_second": 14.201,
592
+ "eval_steps_per_second": 1.878,
593
+ "step": 235
594
+ },
595
+ {
596
+ "epoch": 4.0,
597
+ "grad_norm": 3.5699214935302734,
598
+ "learning_rate": 0.00012033333333333335,
599
+ "loss": 0.1687,
600
+ "step": 240
601
+ },
602
+ {
603
+ "epoch": 4.0,
604
+ "eval_accuracy": 0.9256198347107438,
605
+ "eval_loss": 0.25712552666664124,
606
+ "eval_runtime": 8.7329,
607
+ "eval_samples_per_second": 13.856,
608
+ "eval_steps_per_second": 1.832,
609
+ "step": 240
610
+ },
611
+ {
612
+ "epoch": 4.08,
613
+ "eval_accuracy": 0.9752066115702479,
614
+ "eval_loss": 0.13568438589572906,
615
+ "eval_runtime": 8.642,
616
+ "eval_samples_per_second": 14.001,
617
+ "eval_steps_per_second": 1.851,
618
+ "step": 245
619
+ },
620
+ {
621
+ "epoch": 4.17,
622
+ "grad_norm": 2.446256637573242,
623
+ "learning_rate": 0.000117,
624
+ "loss": 0.0758,
625
+ "step": 250
626
+ },
627
+ {
628
+ "epoch": 4.17,
629
+ "eval_accuracy": 0.9504132231404959,
630
+ "eval_loss": 0.17341962456703186,
631
+ "eval_runtime": 8.0872,
632
+ "eval_samples_per_second": 14.962,
633
+ "eval_steps_per_second": 1.978,
634
+ "step": 250
635
+ },
636
+ {
637
+ "epoch": 4.25,
638
+ "eval_accuracy": 0.9752066115702479,
639
+ "eval_loss": 0.11965644359588623,
640
+ "eval_runtime": 9.0168,
641
+ "eval_samples_per_second": 13.419,
642
+ "eval_steps_per_second": 1.774,
643
+ "step": 255
644
+ },
645
+ {
646
+ "epoch": 4.33,
647
+ "grad_norm": 0.048665851354599,
648
+ "learning_rate": 0.00011366666666666667,
649
+ "loss": 0.042,
650
+ "step": 260
651
+ },
652
+ {
653
+ "epoch": 4.33,
654
+ "eval_accuracy": 0.9421487603305785,
655
+ "eval_loss": 0.23387375473976135,
656
+ "eval_runtime": 8.7974,
657
+ "eval_samples_per_second": 13.754,
658
+ "eval_steps_per_second": 1.819,
659
+ "step": 260
660
+ },
661
+ {
662
+ "epoch": 4.42,
663
+ "eval_accuracy": 0.9173553719008265,
664
+ "eval_loss": 0.2923980951309204,
665
+ "eval_runtime": 8.7236,
666
+ "eval_samples_per_second": 13.87,
667
+ "eval_steps_per_second": 1.834,
668
+ "step": 265
669
+ },
670
+ {
671
+ "epoch": 4.5,
672
+ "grad_norm": 0.07859846204519272,
673
+ "learning_rate": 0.00011033333333333334,
674
+ "loss": 0.0114,
675
+ "step": 270
676
+ },
677
+ {
678
+ "epoch": 4.5,
679
+ "eval_accuracy": 0.9504132231404959,
680
+ "eval_loss": 0.23183898627758026,
681
+ "eval_runtime": 8.7284,
682
+ "eval_samples_per_second": 13.863,
683
+ "eval_steps_per_second": 1.833,
684
+ "step": 270
685
+ },
686
+ {
687
+ "epoch": 4.58,
688
+ "eval_accuracy": 0.9586776859504132,
689
+ "eval_loss": 0.17654092609882355,
690
+ "eval_runtime": 8.8357,
691
+ "eval_samples_per_second": 13.694,
692
+ "eval_steps_per_second": 1.811,
693
+ "step": 275
694
+ },
695
+ {
696
+ "epoch": 4.67,
697
+ "grad_norm": 0.266812801361084,
698
+ "learning_rate": 0.00010700000000000001,
699
+ "loss": 0.0197,
700
+ "step": 280
701
+ },
702
+ {
703
+ "epoch": 4.67,
704
+ "eval_accuracy": 0.9669421487603306,
705
+ "eval_loss": 0.12631382048130035,
706
+ "eval_runtime": 7.9448,
707
+ "eval_samples_per_second": 15.23,
708
+ "eval_steps_per_second": 2.014,
709
+ "step": 280
710
+ },
711
+ {
712
+ "epoch": 4.75,
713
+ "eval_accuracy": 0.9669421487603306,
714
+ "eval_loss": 0.12528358399868011,
715
+ "eval_runtime": 8.1841,
716
+ "eval_samples_per_second": 14.785,
717
+ "eval_steps_per_second": 1.955,
718
+ "step": 285
719
+ },
720
+ {
721
+ "epoch": 4.83,
722
+ "grad_norm": 0.8625770807266235,
723
+ "learning_rate": 0.00010366666666666666,
724
+ "loss": 0.0283,
725
+ "step": 290
726
+ },
727
+ {
728
+ "epoch": 4.83,
729
+ "eval_accuracy": 0.9669421487603306,
730
+ "eval_loss": 0.1239379420876503,
731
+ "eval_runtime": 9.1766,
732
+ "eval_samples_per_second": 13.186,
733
+ "eval_steps_per_second": 1.744,
734
+ "step": 290
735
+ },
736
+ {
737
+ "epoch": 4.92,
738
+ "eval_accuracy": 0.9669421487603306,
739
+ "eval_loss": 0.12782499194145203,
740
+ "eval_runtime": 8.9807,
741
+ "eval_samples_per_second": 13.473,
742
+ "eval_steps_per_second": 1.782,
743
+ "step": 295
744
+ },
745
+ {
746
+ "epoch": 5.0,
747
+ "grad_norm": 0.08663175255060196,
748
+ "learning_rate": 0.00010033333333333335,
749
+ "loss": 0.1115,
750
+ "step": 300
751
+ },
752
+ {
753
+ "epoch": 5.0,
754
+ "eval_accuracy": 0.9338842975206612,
755
+ "eval_loss": 0.2527827322483063,
756
+ "eval_runtime": 8.5733,
757
+ "eval_samples_per_second": 14.114,
758
+ "eval_steps_per_second": 1.866,
759
+ "step": 300
760
+ },
761
+ {
762
+ "epoch": 5.08,
763
+ "eval_accuracy": 0.9338842975206612,
764
+ "eval_loss": 0.3164093792438507,
765
+ "eval_runtime": 8.5647,
766
+ "eval_samples_per_second": 14.128,
767
+ "eval_steps_per_second": 1.868,
768
+ "step": 305
769
+ },
770
+ {
771
+ "epoch": 5.17,
772
+ "grad_norm": 0.11408742517232895,
773
+ "learning_rate": 9.7e-05,
774
+ "loss": 0.0404,
775
+ "step": 310
776
+ },
777
+ {
778
+ "epoch": 5.17,
779
+ "eval_accuracy": 0.9338842975206612,
780
+ "eval_loss": 0.2841833829879761,
781
+ "eval_runtime": 8.7395,
782
+ "eval_samples_per_second": 13.845,
783
+ "eval_steps_per_second": 1.831,
784
+ "step": 310
785
+ },
786
+ {
787
+ "epoch": 5.25,
788
+ "eval_accuracy": 0.9504132231404959,
789
+ "eval_loss": 0.17133790254592896,
790
+ "eval_runtime": 8.8895,
791
+ "eval_samples_per_second": 13.612,
792
+ "eval_steps_per_second": 1.8,
793
+ "step": 315
794
+ },
795
+ {
796
+ "epoch": 5.33,
797
+ "grad_norm": 0.08482904732227325,
798
+ "learning_rate": 9.366666666666668e-05,
799
+ "loss": 0.0719,
800
+ "step": 320
801
+ },
802
+ {
803
+ "epoch": 5.33,
804
+ "eval_accuracy": 0.9338842975206612,
805
+ "eval_loss": 0.18959270417690277,
806
+ "eval_runtime": 8.641,
807
+ "eval_samples_per_second": 14.003,
808
+ "eval_steps_per_second": 1.852,
809
+ "step": 320
810
+ },
811
+ {
812
+ "epoch": 5.42,
813
+ "eval_accuracy": 0.9256198347107438,
814
+ "eval_loss": 0.18550463020801544,
815
+ "eval_runtime": 8.219,
816
+ "eval_samples_per_second": 14.722,
817
+ "eval_steps_per_second": 1.947,
818
+ "step": 325
819
+ },
820
+ {
821
+ "epoch": 5.5,
822
+ "grad_norm": 4.354619026184082,
823
+ "learning_rate": 9.033333333333334e-05,
824
+ "loss": 0.0435,
825
+ "step": 330
826
+ },
827
+ {
828
+ "epoch": 5.5,
829
+ "eval_accuracy": 0.9669421487603306,
830
+ "eval_loss": 0.15409986674785614,
831
+ "eval_runtime": 8.5474,
832
+ "eval_samples_per_second": 14.156,
833
+ "eval_steps_per_second": 1.872,
834
+ "step": 330
835
+ },
836
+ {
837
+ "epoch": 5.58,
838
+ "eval_accuracy": 0.9669421487603306,
839
+ "eval_loss": 0.10501641035079956,
840
+ "eval_runtime": 8.6962,
841
+ "eval_samples_per_second": 13.914,
842
+ "eval_steps_per_second": 1.84,
843
+ "step": 335
844
+ },
845
+ {
846
+ "epoch": 5.67,
847
+ "grad_norm": 0.05306649208068848,
848
+ "learning_rate": 8.7e-05,
849
+ "loss": 0.0129,
850
+ "step": 340
851
+ },
852
+ {
853
+ "epoch": 5.67,
854
+ "eval_accuracy": 0.9586776859504132,
855
+ "eval_loss": 0.10632016509771347,
856
+ "eval_runtime": 8.7849,
857
+ "eval_samples_per_second": 13.774,
858
+ "eval_steps_per_second": 1.821,
859
+ "step": 340
860
+ },
861
+ {
862
+ "epoch": 5.75,
863
+ "eval_accuracy": 0.9586776859504132,
864
+ "eval_loss": 0.11378511786460876,
865
+ "eval_runtime": 8.0973,
866
+ "eval_samples_per_second": 14.943,
867
+ "eval_steps_per_second": 1.976,
868
+ "step": 345
869
+ },
870
+ {
871
+ "epoch": 5.83,
872
+ "grad_norm": 0.02987060882151127,
873
+ "learning_rate": 8.366666666666668e-05,
874
+ "loss": 0.0222,
875
+ "step": 350
876
+ },
877
+ {
878
+ "epoch": 5.83,
879
+ "eval_accuracy": 0.9586776859504132,
880
+ "eval_loss": 0.11444854736328125,
881
+ "eval_runtime": 8.7513,
882
+ "eval_samples_per_second": 13.827,
883
+ "eval_steps_per_second": 1.828,
884
+ "step": 350
885
+ },
886
+ {
887
+ "epoch": 5.92,
888
+ "eval_accuracy": 0.9669421487603306,
889
+ "eval_loss": 0.12378235161304474,
890
+ "eval_runtime": 9.2818,
891
+ "eval_samples_per_second": 13.036,
892
+ "eval_steps_per_second": 1.724,
893
+ "step": 355
894
+ },
895
+ {
896
+ "epoch": 6.0,
897
+ "grad_norm": 0.02307078428566456,
898
+ "learning_rate": 8.033333333333334e-05,
899
+ "loss": 0.0431,
900
+ "step": 360
901
+ },
902
+ {
903
+ "epoch": 6.0,
904
+ "eval_accuracy": 0.9752066115702479,
905
+ "eval_loss": 0.1342514306306839,
906
+ "eval_runtime": 7.9783,
907
+ "eval_samples_per_second": 15.166,
908
+ "eval_steps_per_second": 2.005,
909
+ "step": 360
910
+ },
911
+ {
912
+ "epoch": 6.08,
913
+ "eval_accuracy": 0.9669421487603306,
914
+ "eval_loss": 0.144140362739563,
915
+ "eval_runtime": 8.763,
916
+ "eval_samples_per_second": 13.808,
917
+ "eval_steps_per_second": 1.826,
918
+ "step": 365
919
+ },
920
+ {
921
+ "epoch": 6.17,
922
+ "grad_norm": 0.016979066655039787,
923
+ "learning_rate": 7.7e-05,
924
+ "loss": 0.0064,
925
+ "step": 370
926
+ },
927
+ {
928
+ "epoch": 6.17,
929
+ "eval_accuracy": 0.9669421487603306,
930
+ "eval_loss": 0.1470753401517868,
931
+ "eval_runtime": 8.9895,
932
+ "eval_samples_per_second": 13.46,
933
+ "eval_steps_per_second": 1.78,
934
+ "step": 370
935
+ },
936
+ {
937
+ "epoch": 6.25,
938
+ "eval_accuracy": 0.9752066115702479,
939
+ "eval_loss": 0.1360587477684021,
940
+ "eval_runtime": 8.9513,
941
+ "eval_samples_per_second": 13.518,
942
+ "eval_steps_per_second": 1.787,
943
+ "step": 375
944
+ },
945
+ {
946
+ "epoch": 6.33,
947
+ "grad_norm": 0.022434255108237267,
948
+ "learning_rate": 7.366666666666668e-05,
949
+ "loss": 0.0576,
950
+ "step": 380
951
+ },
952
+ {
953
+ "epoch": 6.33,
954
+ "eval_accuracy": 0.9752066115702479,
955
+ "eval_loss": 0.13161548972129822,
956
+ "eval_runtime": 8.7711,
957
+ "eval_samples_per_second": 13.795,
958
+ "eval_steps_per_second": 1.824,
959
+ "step": 380
960
+ },
961
+ {
962
+ "epoch": 6.42,
963
+ "eval_accuracy": 0.9669421487603306,
964
+ "eval_loss": 0.12319940328598022,
965
+ "eval_runtime": 8.3275,
966
+ "eval_samples_per_second": 14.53,
967
+ "eval_steps_per_second": 1.921,
968
+ "step": 385
969
+ },
970
+ {
971
+ "epoch": 6.5,
972
+ "grad_norm": 0.2556796371936798,
973
+ "learning_rate": 7.033333333333334e-05,
974
+ "loss": 0.0298,
975
+ "step": 390
976
+ },
977
+ {
978
+ "epoch": 6.5,
979
+ "eval_accuracy": 0.9669421487603306,
980
+ "eval_loss": 0.1254538893699646,
981
+ "eval_runtime": 8.6034,
982
+ "eval_samples_per_second": 14.064,
983
+ "eval_steps_per_second": 1.86,
984
+ "step": 390
985
+ },
986
+ {
987
+ "epoch": 6.58,
988
+ "eval_accuracy": 0.9669421487603306,
989
+ "eval_loss": 0.13591305911540985,
990
+ "eval_runtime": 8.8244,
991
+ "eval_samples_per_second": 13.712,
992
+ "eval_steps_per_second": 1.813,
993
+ "step": 395
994
+ },
995
+ {
996
+ "epoch": 6.67,
997
+ "grad_norm": 0.03436155617237091,
998
+ "learning_rate": 6.7e-05,
999
+ "loss": 0.0097,
1000
+ "step": 400
1001
+ },
1002
+ {
1003
+ "epoch": 6.67,
1004
+ "eval_accuracy": 0.9669421487603306,
1005
+ "eval_loss": 0.1434980034828186,
1006
+ "eval_runtime": 9.1677,
1007
+ "eval_samples_per_second": 13.199,
1008
+ "eval_steps_per_second": 1.745,
1009
+ "step": 400
1010
+ },
1011
+ {
1012
+ "epoch": 6.75,
1013
+ "eval_accuracy": 0.9669421487603306,
1014
+ "eval_loss": 0.14506025612354279,
1015
+ "eval_runtime": 8.7551,
1016
+ "eval_samples_per_second": 13.82,
1017
+ "eval_steps_per_second": 1.827,
1018
+ "step": 405
1019
+ },
1020
+ {
1021
+ "epoch": 6.83,
1022
+ "grad_norm": 0.019292179495096207,
1023
+ "learning_rate": 6.366666666666668e-05,
1024
+ "loss": 0.0153,
1025
+ "step": 410
1026
+ },
1027
+ {
1028
+ "epoch": 6.83,
1029
+ "eval_accuracy": 0.9669421487603306,
1030
+ "eval_loss": 0.14391547441482544,
1031
+ "eval_runtime": 8.6401,
1032
+ "eval_samples_per_second": 14.004,
1033
+ "eval_steps_per_second": 1.852,
1034
+ "step": 410
1035
+ },
1036
+ {
1037
+ "epoch": 6.92,
1038
+ "eval_accuracy": 0.9752066115702479,
1039
+ "eval_loss": 0.1352916657924652,
1040
+ "eval_runtime": 8.9781,
1041
+ "eval_samples_per_second": 13.477,
1042
+ "eval_steps_per_second": 1.782,
1043
+ "step": 415
1044
+ },
1045
+ {
1046
+ "epoch": 7.0,
1047
+ "grad_norm": 0.1764516532421112,
1048
+ "learning_rate": 6.033333333333334e-05,
1049
+ "loss": 0.0406,
1050
+ "step": 420
1051
+ },
1052
+ {
1053
+ "epoch": 7.0,
1054
+ "eval_accuracy": 0.9752066115702479,
1055
+ "eval_loss": 0.13157208263874054,
1056
+ "eval_runtime": 8.4013,
1057
+ "eval_samples_per_second": 14.402,
1058
+ "eval_steps_per_second": 1.904,
1059
+ "step": 420
1060
+ },
1061
+ {
1062
+ "epoch": 7.08,
1063
+ "eval_accuracy": 0.9752066115702479,
1064
+ "eval_loss": 0.13093091547489166,
1065
+ "eval_runtime": 9.2645,
1066
+ "eval_samples_per_second": 13.061,
1067
+ "eval_steps_per_second": 1.727,
1068
+ "step": 425
1069
+ },
1070
+ {
1071
+ "epoch": 7.17,
1072
+ "grad_norm": 0.025451194494962692,
1073
+ "learning_rate": 5.6999999999999996e-05,
1074
+ "loss": 0.0154,
1075
+ "step": 430
1076
+ },
1077
+ {
1078
+ "epoch": 7.17,
1079
+ "eval_accuracy": 0.9752066115702479,
1080
+ "eval_loss": 0.13050581514835358,
1081
+ "eval_runtime": 8.9669,
1082
+ "eval_samples_per_second": 13.494,
1083
+ "eval_steps_per_second": 1.784,
1084
+ "step": 430
1085
+ },
1086
+ {
1087
+ "epoch": 7.25,
1088
+ "eval_accuracy": 0.9752066115702479,
1089
+ "eval_loss": 0.13096679747104645,
1090
+ "eval_runtime": 8.8412,
1091
+ "eval_samples_per_second": 13.686,
1092
+ "eval_steps_per_second": 1.81,
1093
+ "step": 435
1094
+ },
1095
+ {
1096
+ "epoch": 7.33,
1097
+ "grad_norm": 0.01770775578916073,
1098
+ "learning_rate": 5.3666666666666666e-05,
1099
+ "loss": 0.0209,
1100
+ "step": 440
1101
+ },
1102
+ {
1103
+ "epoch": 7.33,
1104
+ "eval_accuracy": 0.9752066115702479,
1105
+ "eval_loss": 0.13012637197971344,
1106
+ "eval_runtime": 8.8578,
1107
+ "eval_samples_per_second": 13.66,
1108
+ "eval_steps_per_second": 1.806,
1109
+ "step": 440
1110
+ },
1111
+ {
1112
+ "epoch": 7.42,
1113
+ "eval_accuracy": 0.9586776859504132,
1114
+ "eval_loss": 0.145931214094162,
1115
+ "eval_runtime": 8.6371,
1116
+ "eval_samples_per_second": 14.009,
1117
+ "eval_steps_per_second": 1.852,
1118
+ "step": 445
1119
+ },
1120
+ {
1121
+ "epoch": 7.5,
1122
+ "grad_norm": 17.443572998046875,
1123
+ "learning_rate": 5.0333333333333335e-05,
1124
+ "loss": 0.0298,
1125
+ "step": 450
1126
+ },
1127
+ {
1128
+ "epoch": 7.5,
1129
+ "eval_accuracy": 0.9586776859504132,
1130
+ "eval_loss": 0.16629938781261444,
1131
+ "eval_runtime": 8.2752,
1132
+ "eval_samples_per_second": 14.622,
1133
+ "eval_steps_per_second": 1.933,
1134
+ "step": 450
1135
+ },
1136
+ {
1137
+ "epoch": 7.58,
1138
+ "eval_accuracy": 0.9586776859504132,
1139
+ "eval_loss": 0.15594066679477692,
1140
+ "eval_runtime": 8.806,
1141
+ "eval_samples_per_second": 13.741,
1142
+ "eval_steps_per_second": 1.817,
1143
+ "step": 455
1144
+ },
1145
+ {
1146
+ "epoch": 7.67,
1147
+ "grad_norm": 0.030463455244898796,
1148
+ "learning_rate": 4.7e-05,
1149
+ "loss": 0.0052,
1150
+ "step": 460
1151
+ },
1152
+ {
1153
+ "epoch": 7.67,
1154
+ "eval_accuracy": 0.9586776859504132,
1155
+ "eval_loss": 0.15159618854522705,
1156
+ "eval_runtime": 8.9863,
1157
+ "eval_samples_per_second": 13.465,
1158
+ "eval_steps_per_second": 1.78,
1159
+ "step": 460
1160
+ },
1161
+ {
1162
+ "epoch": 7.75,
1163
+ "eval_accuracy": 0.9586776859504132,
1164
+ "eval_loss": 0.13964863121509552,
1165
+ "eval_runtime": 8.4469,
1166
+ "eval_samples_per_second": 14.325,
1167
+ "eval_steps_per_second": 1.894,
1168
+ "step": 465
1169
+ },
1170
+ {
1171
+ "epoch": 7.83,
1172
+ "grad_norm": 0.11391649395227432,
1173
+ "learning_rate": 4.3666666666666666e-05,
1174
+ "loss": 0.0172,
1175
+ "step": 470
1176
+ },
1177
+ {
1178
+ "epoch": 7.83,
1179
+ "eval_accuracy": 0.9586776859504132,
1180
+ "eval_loss": 0.13303633034229279,
1181
+ "eval_runtime": 8.9949,
1182
+ "eval_samples_per_second": 13.452,
1183
+ "eval_steps_per_second": 1.779,
1184
+ "step": 470
1185
+ },
1186
+ {
1187
+ "epoch": 7.92,
1188
+ "eval_accuracy": 0.9752066115702479,
1189
+ "eval_loss": 0.12355250120162964,
1190
+ "eval_runtime": 8.1109,
1191
+ "eval_samples_per_second": 14.918,
1192
+ "eval_steps_per_second": 1.973,
1193
+ "step": 475
1194
+ },
1195
+ {
1196
+ "epoch": 8.0,
1197
+ "grad_norm": 0.059335533529520035,
1198
+ "learning_rate": 4.0333333333333336e-05,
1199
+ "loss": 0.0348,
1200
+ "step": 480
1201
+ },
1202
+ {
1203
+ "epoch": 8.0,
1204
+ "eval_accuracy": 0.9752066115702479,
1205
+ "eval_loss": 0.12098132073879242,
1206
+ "eval_runtime": 8.8837,
1207
+ "eval_samples_per_second": 13.621,
1208
+ "eval_steps_per_second": 1.801,
1209
+ "step": 480
1210
+ },
1211
+ {
1212
+ "epoch": 8.08,
1213
+ "eval_accuracy": 0.9752066115702479,
1214
+ "eval_loss": 0.11751802265644073,
1215
+ "eval_runtime": 9.8045,
1216
+ "eval_samples_per_second": 12.341,
1217
+ "eval_steps_per_second": 1.632,
1218
+ "step": 485
1219
+ },
1220
+ {
1221
+ "epoch": 8.17,
1222
+ "grad_norm": 0.019469719380140305,
1223
+ "learning_rate": 3.7e-05,
1224
+ "loss": 0.0068,
1225
+ "step": 490
1226
+ },
1227
+ {
1228
+ "epoch": 8.17,
1229
+ "eval_accuracy": 0.9752066115702479,
1230
+ "eval_loss": 0.1185019463300705,
1231
+ "eval_runtime": 8.8757,
1232
+ "eval_samples_per_second": 13.633,
1233
+ "eval_steps_per_second": 1.803,
1234
+ "step": 490
1235
+ },
1236
+ {
1237
+ "epoch": 8.25,
1238
+ "eval_accuracy": 0.9752066115702479,
1239
+ "eval_loss": 0.12291049212217331,
1240
+ "eval_runtime": 8.9999,
1241
+ "eval_samples_per_second": 13.445,
1242
+ "eval_steps_per_second": 1.778,
1243
+ "step": 495
1244
+ },
1245
+ {
1246
+ "epoch": 8.33,
1247
+ "grad_norm": 0.029128719121217728,
1248
+ "learning_rate": 3.366666666666667e-05,
1249
+ "loss": 0.0305,
1250
+ "step": 500
1251
+ },
1252
+ {
1253
+ "epoch": 8.33,
1254
+ "eval_accuracy": 0.9752066115702479,
1255
+ "eval_loss": 0.12297818809747696,
1256
+ "eval_runtime": 9.0782,
1257
+ "eval_samples_per_second": 13.329,
1258
+ "eval_steps_per_second": 1.762,
1259
+ "step": 500
1260
+ },
1261
+ {
1262
+ "epoch": 8.42,
1263
+ "eval_accuracy": 0.9752066115702479,
1264
+ "eval_loss": 0.12048203498125076,
1265
+ "eval_runtime": 8.2377,
1266
+ "eval_samples_per_second": 14.688,
1267
+ "eval_steps_per_second": 1.942,
1268
+ "step": 505
1269
+ },
1270
+ {
1271
+ "epoch": 8.5,
1272
+ "grad_norm": 0.01591232791543007,
1273
+ "learning_rate": 3.0333333333333337e-05,
1274
+ "loss": 0.0154,
1275
+ "step": 510
1276
+ },
1277
+ {
1278
+ "epoch": 8.5,
1279
+ "eval_accuracy": 0.9752066115702479,
1280
+ "eval_loss": 0.11965296417474747,
1281
+ "eval_runtime": 9.3646,
1282
+ "eval_samples_per_second": 12.921,
1283
+ "eval_steps_per_second": 1.709,
1284
+ "step": 510
1285
+ },
1286
+ {
1287
+ "epoch": 8.58,
1288
+ "eval_accuracy": 0.9752066115702479,
1289
+ "eval_loss": 0.12173377722501755,
1290
+ "eval_runtime": 9.3963,
1291
+ "eval_samples_per_second": 12.877,
1292
+ "eval_steps_per_second": 1.703,
1293
+ "step": 515
1294
+ },
1295
+ {
1296
+ "epoch": 8.67,
1297
+ "grad_norm": 0.02258380502462387,
1298
+ "learning_rate": 2.7000000000000002e-05,
1299
+ "loss": 0.0177,
1300
+ "step": 520
1301
+ },
1302
+ {
1303
+ "epoch": 8.67,
1304
+ "eval_accuracy": 0.9752066115702479,
1305
+ "eval_loss": 0.12387024611234665,
1306
+ "eval_runtime": 9.1961,
1307
+ "eval_samples_per_second": 13.158,
1308
+ "eval_steps_per_second": 1.74,
1309
+ "step": 520
1310
+ },
1311
+ {
1312
+ "epoch": 8.75,
1313
+ "eval_accuracy": 0.9752066115702479,
1314
+ "eval_loss": 0.12437601387500763,
1315
+ "eval_runtime": 9.196,
1316
+ "eval_samples_per_second": 13.158,
1317
+ "eval_steps_per_second": 1.74,
1318
+ "step": 525
1319
+ },
1320
+ {
1321
+ "epoch": 8.83,
1322
+ "grad_norm": 0.020360412076115608,
1323
+ "learning_rate": 2.3666666666666668e-05,
1324
+ "loss": 0.0123,
1325
+ "step": 530
1326
+ },
1327
+ {
1328
+ "epoch": 8.83,
1329
+ "eval_accuracy": 0.9669421487603306,
1330
+ "eval_loss": 0.12708893418312073,
1331
+ "eval_runtime": 8.8063,
1332
+ "eval_samples_per_second": 13.74,
1333
+ "eval_steps_per_second": 1.817,
1334
+ "step": 530
1335
+ },
1336
+ {
1337
+ "epoch": 8.92,
1338
+ "eval_accuracy": 0.9669421487603306,
1339
+ "eval_loss": 0.1300380975008011,
1340
+ "eval_runtime": 8.9605,
1341
+ "eval_samples_per_second": 13.504,
1342
+ "eval_steps_per_second": 1.786,
1343
+ "step": 535
1344
+ },
1345
+ {
1346
+ "epoch": 9.0,
1347
+ "grad_norm": 0.6857224702835083,
1348
+ "learning_rate": 2.0333333333333334e-05,
1349
+ "loss": 0.0154,
1350
+ "step": 540
1351
+ },
1352
+ {
1353
+ "epoch": 9.0,
1354
+ "eval_accuracy": 0.9669421487603306,
1355
+ "eval_loss": 0.13137011229991913,
1356
+ "eval_runtime": 8.8274,
1357
+ "eval_samples_per_second": 13.707,
1358
+ "eval_steps_per_second": 1.813,
1359
+ "step": 540
1360
+ },
1361
+ {
1362
+ "epoch": 9.08,
1363
+ "eval_accuracy": 0.9669421487603306,
1364
+ "eval_loss": 0.1295723170042038,
1365
+ "eval_runtime": 8.8666,
1366
+ "eval_samples_per_second": 13.647,
1367
+ "eval_steps_per_second": 1.805,
1368
+ "step": 545
1369
+ },
1370
+ {
1371
+ "epoch": 9.17,
1372
+ "grad_norm": 0.4675326943397522,
1373
+ "learning_rate": 1.7000000000000003e-05,
1374
+ "loss": 0.0331,
1375
+ "step": 550
1376
+ },
1377
+ {
1378
+ "epoch": 9.17,
1379
+ "eval_accuracy": 0.9752066115702479,
1380
+ "eval_loss": 0.12511087954044342,
1381
+ "eval_runtime": 9.3211,
1382
+ "eval_samples_per_second": 12.981,
1383
+ "eval_steps_per_second": 1.717,
1384
+ "step": 550
1385
+ },
1386
+ {
1387
+ "epoch": 9.25,
1388
+ "eval_accuracy": 0.9752066115702479,
1389
+ "eval_loss": 0.12691423296928406,
1390
+ "eval_runtime": 9.1389,
1391
+ "eval_samples_per_second": 13.24,
1392
+ "eval_steps_per_second": 1.751,
1393
+ "step": 555
1394
+ },
1395
+ {
1396
+ "epoch": 9.33,
1397
+ "grad_norm": 0.02136796899139881,
1398
+ "learning_rate": 1.3666666666666666e-05,
1399
+ "loss": 0.0196,
1400
+ "step": 560
1401
+ },
1402
+ {
1403
+ "epoch": 9.33,
1404
+ "eval_accuracy": 0.9752066115702479,
1405
+ "eval_loss": 0.12836582958698273,
1406
+ "eval_runtime": 8.6274,
1407
+ "eval_samples_per_second": 14.025,
1408
+ "eval_steps_per_second": 1.855,
1409
+ "step": 560
1410
+ },
1411
+ {
1412
+ "epoch": 9.42,
1413
+ "eval_accuracy": 0.9669421487603306,
1414
+ "eval_loss": 0.1298026293516159,
1415
+ "eval_runtime": 8.3157,
1416
+ "eval_samples_per_second": 14.551,
1417
+ "eval_steps_per_second": 1.924,
1418
+ "step": 565
1419
+ },
1420
+ {
1421
+ "epoch": 9.5,
1422
+ "grad_norm": 0.014311849139630795,
1423
+ "learning_rate": 1.0333333333333333e-05,
1424
+ "loss": 0.0058,
1425
+ "step": 570
1426
+ },
1427
+ {
1428
+ "epoch": 9.5,
1429
+ "eval_accuracy": 0.9669421487603306,
1430
+ "eval_loss": 0.13134212791919708,
1431
+ "eval_runtime": 9.1247,
1432
+ "eval_samples_per_second": 13.261,
1433
+ "eval_steps_per_second": 1.753,
1434
+ "step": 570
1435
+ },
1436
+ {
1437
+ "epoch": 9.58,
1438
+ "eval_accuracy": 0.9669421487603306,
1439
+ "eval_loss": 0.13212385773658752,
1440
+ "eval_runtime": 8.6468,
1441
+ "eval_samples_per_second": 13.994,
1442
+ "eval_steps_per_second": 1.85,
1443
+ "step": 575
1444
+ },
1445
+ {
1446
+ "epoch": 9.67,
1447
+ "grad_norm": 0.4894683361053467,
1448
+ "learning_rate": 7.000000000000001e-06,
1449
+ "loss": 0.012,
1450
+ "step": 580
1451
+ },
1452
+ {
1453
+ "epoch": 9.67,
1454
+ "eval_accuracy": 0.9669421487603306,
1455
+ "eval_loss": 0.1326775997877121,
1456
+ "eval_runtime": 9.2181,
1457
+ "eval_samples_per_second": 13.126,
1458
+ "eval_steps_per_second": 1.736,
1459
+ "step": 580
1460
+ },
1461
+ {
1462
+ "epoch": 9.75,
1463
+ "eval_accuracy": 0.9669421487603306,
1464
+ "eval_loss": 0.13264916837215424,
1465
+ "eval_runtime": 8.9299,
1466
+ "eval_samples_per_second": 13.55,
1467
+ "eval_steps_per_second": 1.792,
1468
+ "step": 585
1469
+ },
1470
+ {
1471
+ "epoch": 9.83,
1472
+ "grad_norm": 0.015242637135088444,
1473
+ "learning_rate": 3.666666666666667e-06,
1474
+ "loss": 0.0081,
1475
+ "step": 590
1476
+ },
1477
+ {
1478
+ "epoch": 9.83,
1479
+ "eval_accuracy": 0.9669421487603306,
1480
+ "eval_loss": 0.13293592631816864,
1481
+ "eval_runtime": 9.2123,
1482
+ "eval_samples_per_second": 13.135,
1483
+ "eval_steps_per_second": 1.737,
1484
+ "step": 590
1485
+ },
1486
+ {
1487
+ "epoch": 9.92,
1488
+ "eval_accuracy": 0.9669421487603306,
1489
+ "eval_loss": 0.13364511728286743,
1490
+ "eval_runtime": 8.9976,
1491
+ "eval_samples_per_second": 13.448,
1492
+ "eval_steps_per_second": 1.778,
1493
+ "step": 595
1494
+ },
1495
+ {
1496
+ "epoch": 10.0,
1497
+ "grad_norm": 0.014774391427636147,
1498
+ "learning_rate": 3.3333333333333335e-07,
1499
+ "loss": 0.0083,
1500
+ "step": 600
1501
+ },
1502
+ {
1503
+ "epoch": 10.0,
1504
+ "eval_accuracy": 0.9669421487603306,
1505
+ "eval_loss": 0.1337580680847168,
1506
+ "eval_runtime": 8.6721,
1507
+ "eval_samples_per_second": 13.953,
1508
+ "eval_steps_per_second": 1.845,
1509
+ "step": 600
1510
+ },
1511
+ {
1512
+ "epoch": 10.0,
1513
+ "step": 600,
1514
+ "total_flos": 7.39286832673751e+17,
1515
+ "train_loss": 0.06743512197087208,
1516
+ "train_runtime": 3546.8991,
1517
+ "train_samples_per_second": 2.69,
1518
+ "train_steps_per_second": 0.169
1519
+ }
1520
+ ],
1521
+ "logging_steps": 10,
1522
+ "max_steps": 600,
1523
+ "num_input_tokens_seen": 0,
1524
+ "num_train_epochs": 10,
1525
+ "save_steps": 5,
1526
+ "total_flos": 7.39286832673751e+17,
1527
+ "train_batch_size": 16,
1528
+ "trial_name": null,
1529
+ "trial_params": null
1530
+ }