Omriy123 commited on
Commit
01c187c
1 Parent(s): 3c59e2a

🍻 cheers

Browse files
README.md CHANGED
@@ -1,5 +1,8 @@
1
  ---
 
 
2
  tags:
 
3
  - generated_from_trainer
4
  datasets:
5
  - imagefolder
@@ -12,7 +15,7 @@ model-index:
12
  name: Image Classification
13
  type: image-classification
14
  dataset:
15
- name: imagefolder
16
  type: imagefolder
17
  config: default
18
  split: train
@@ -28,7 +31,7 @@ should probably proofread and complete it, then remove this comment. -->
28
 
29
  # vit_epochs5_batch64_lr0.001_size224_tiles1_seed1_vit_old_transform_old_hp
30
 
31
- This model was trained from scratch on the imagefolder dataset.
32
  It achieves the following results on the evaluation set:
33
  - Loss: 0.5220
34
  - Accuracy: 0.7539
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: google/vit-base-patch16-224-in21k
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  datasets:
8
  - imagefolder
 
15
  name: Image Classification
16
  type: image-classification
17
  dataset:
18
+ name: Dogs_vs_Cats
19
  type: imagefolder
20
  config: default
21
  split: train
 
31
 
32
  # vit_epochs5_batch64_lr0.001_size224_tiles1_seed1_vit_old_transform_old_hp
33
 
34
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the Dogs_vs_Cats dataset.
35
  It achieves the following results on the evaluation set:
36
  - Loss: 0.5220
37
  - Accuracy: 0.7539
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "eval_accuracy": 0.7538666666666667,
4
+ "eval_loss": 0.5220404863357544,
5
+ "eval_runtime": 53.1687,
6
+ "eval_samples_per_second": 70.53,
7
+ "eval_steps_per_second": 1.11,
8
+ "total_flos": 5.8118992210944e+18,
9
+ "train_loss": 0.5645027552259729,
10
+ "train_runtime": 2886.1261,
11
+ "train_samples_per_second": 25.986,
12
+ "train_steps_per_second": 0.407
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "eval_accuracy": 0.7538666666666667,
4
+ "eval_loss": 0.5220404863357544,
5
+ "eval_runtime": 53.1687,
6
+ "eval_samples_per_second": 70.53,
7
+ "eval_steps_per_second": 1.11
8
+ }
runs/May24_15-27-09_a15a230f540e/events.out.tfevents.1716567389.a15a230f540e.2637.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0874aec3c337a6286fc83cc31e8127d1f869e3eb633b8a42ddc571003217660
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "total_flos": 5.8118992210944e+18,
4
+ "train_loss": 0.5645027552259729,
5
+ "train_runtime": 2886.1261,
6
+ "train_samples_per_second": 25.986,
7
+ "train_steps_per_second": 0.407
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1732 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5220404863357544,
3
+ "best_model_checkpoint": "vit_epochs5_batch64_lr0.001_size224_tiles1_seed1_vit_old_transform_old_hp/checkpoint-1175",
4
+ "epoch": 5.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1175,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02127659574468085,
13
+ "grad_norm": 1.5236927270889282,
14
+ "learning_rate": 0.000995744680851064,
15
+ "loss": 0.8552,
16
+ "step": 5
17
+ },
18
+ {
19
+ "epoch": 0.0425531914893617,
20
+ "grad_norm": 0.1933981478214264,
21
+ "learning_rate": 0.0009914893617021276,
22
+ "loss": 0.6984,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 0.06382978723404255,
27
+ "grad_norm": 0.2786834239959717,
28
+ "learning_rate": 0.0009872340425531915,
29
+ "loss": 0.684,
30
+ "step": 15
31
+ },
32
+ {
33
+ "epoch": 0.0851063829787234,
34
+ "grad_norm": 0.3437531888484955,
35
+ "learning_rate": 0.0009829787234042554,
36
+ "loss": 0.699,
37
+ "step": 20
38
+ },
39
+ {
40
+ "epoch": 0.10638297872340426,
41
+ "grad_norm": 0.17921054363250732,
42
+ "learning_rate": 0.0009787234042553192,
43
+ "loss": 0.6876,
44
+ "step": 25
45
+ },
46
+ {
47
+ "epoch": 0.1276595744680851,
48
+ "grad_norm": 0.2969794273376465,
49
+ "learning_rate": 0.0009744680851063829,
50
+ "loss": 0.7084,
51
+ "step": 30
52
+ },
53
+ {
54
+ "epoch": 0.14893617021276595,
55
+ "grad_norm": 0.2975955307483673,
56
+ "learning_rate": 0.0009702127659574468,
57
+ "loss": 0.6938,
58
+ "step": 35
59
+ },
60
+ {
61
+ "epoch": 0.1702127659574468,
62
+ "grad_norm": 0.049827929586172104,
63
+ "learning_rate": 0.0009659574468085106,
64
+ "loss": 0.6834,
65
+ "step": 40
66
+ },
67
+ {
68
+ "epoch": 0.19148936170212766,
69
+ "grad_norm": 0.6071491837501526,
70
+ "learning_rate": 0.0009617021276595745,
71
+ "loss": 0.6737,
72
+ "step": 45
73
+ },
74
+ {
75
+ "epoch": 0.2127659574468085,
76
+ "grad_norm": 0.1733636111021042,
77
+ "learning_rate": 0.0009574468085106384,
78
+ "loss": 0.6401,
79
+ "step": 50
80
+ },
81
+ {
82
+ "epoch": 0.23404255319148937,
83
+ "grad_norm": 0.6925361752510071,
84
+ "learning_rate": 0.0009531914893617022,
85
+ "loss": 0.6786,
86
+ "step": 55
87
+ },
88
+ {
89
+ "epoch": 0.2553191489361702,
90
+ "grad_norm": 1.0148730278015137,
91
+ "learning_rate": 0.000948936170212766,
92
+ "loss": 0.6925,
93
+ "step": 60
94
+ },
95
+ {
96
+ "epoch": 0.2765957446808511,
97
+ "grad_norm": 0.4391551911830902,
98
+ "learning_rate": 0.0009446808510638298,
99
+ "loss": 0.7001,
100
+ "step": 65
101
+ },
102
+ {
103
+ "epoch": 0.2978723404255319,
104
+ "grad_norm": 0.10365554690361023,
105
+ "learning_rate": 0.0009404255319148937,
106
+ "loss": 0.661,
107
+ "step": 70
108
+ },
109
+ {
110
+ "epoch": 0.3191489361702128,
111
+ "grad_norm": 0.5373475551605225,
112
+ "learning_rate": 0.0009361702127659575,
113
+ "loss": 0.6646,
114
+ "step": 75
115
+ },
116
+ {
117
+ "epoch": 0.3404255319148936,
118
+ "grad_norm": 0.26909396052360535,
119
+ "learning_rate": 0.0009319148936170214,
120
+ "loss": 0.6496,
121
+ "step": 80
122
+ },
123
+ {
124
+ "epoch": 0.3617021276595745,
125
+ "grad_norm": 0.7345396876335144,
126
+ "learning_rate": 0.0009276595744680851,
127
+ "loss": 0.6809,
128
+ "step": 85
129
+ },
130
+ {
131
+ "epoch": 0.3829787234042553,
132
+ "grad_norm": 0.17642471194267273,
133
+ "learning_rate": 0.0009234042553191489,
134
+ "loss": 0.6689,
135
+ "step": 90
136
+ },
137
+ {
138
+ "epoch": 0.40425531914893614,
139
+ "grad_norm": 0.24865615367889404,
140
+ "learning_rate": 0.0009191489361702128,
141
+ "loss": 0.6668,
142
+ "step": 95
143
+ },
144
+ {
145
+ "epoch": 0.425531914893617,
146
+ "grad_norm": 0.0725848600268364,
147
+ "learning_rate": 0.0009148936170212766,
148
+ "loss": 0.6955,
149
+ "step": 100
150
+ },
151
+ {
152
+ "epoch": 0.44680851063829785,
153
+ "grad_norm": 0.6779701113700867,
154
+ "learning_rate": 0.0009106382978723405,
155
+ "loss": 0.6643,
156
+ "step": 105
157
+ },
158
+ {
159
+ "epoch": 0.46808510638297873,
160
+ "grad_norm": 0.2594638466835022,
161
+ "learning_rate": 0.0009063829787234043,
162
+ "loss": 0.6774,
163
+ "step": 110
164
+ },
165
+ {
166
+ "epoch": 0.48936170212765956,
167
+ "grad_norm": 0.41974830627441406,
168
+ "learning_rate": 0.000902127659574468,
169
+ "loss": 0.6632,
170
+ "step": 115
171
+ },
172
+ {
173
+ "epoch": 0.5106382978723404,
174
+ "grad_norm": 0.2086678445339203,
175
+ "learning_rate": 0.0008978723404255319,
176
+ "loss": 0.6264,
177
+ "step": 120
178
+ },
179
+ {
180
+ "epoch": 0.5319148936170213,
181
+ "grad_norm": 0.45617616176605225,
182
+ "learning_rate": 0.0008936170212765957,
183
+ "loss": 0.6538,
184
+ "step": 125
185
+ },
186
+ {
187
+ "epoch": 0.5531914893617021,
188
+ "grad_norm": 0.32972219586372375,
189
+ "learning_rate": 0.0008893617021276596,
190
+ "loss": 0.6471,
191
+ "step": 130
192
+ },
193
+ {
194
+ "epoch": 0.574468085106383,
195
+ "grad_norm": 0.5587528347969055,
196
+ "learning_rate": 0.0008851063829787234,
197
+ "loss": 0.624,
198
+ "step": 135
199
+ },
200
+ {
201
+ "epoch": 0.5957446808510638,
202
+ "grad_norm": 0.5918276906013489,
203
+ "learning_rate": 0.0008808510638297873,
204
+ "loss": 0.6576,
205
+ "step": 140
206
+ },
207
+ {
208
+ "epoch": 0.6170212765957447,
209
+ "grad_norm": 0.35423263907432556,
210
+ "learning_rate": 0.0008765957446808511,
211
+ "loss": 0.6376,
212
+ "step": 145
213
+ },
214
+ {
215
+ "epoch": 0.6382978723404256,
216
+ "grad_norm": 0.49659672379493713,
217
+ "learning_rate": 0.0008723404255319149,
218
+ "loss": 0.6555,
219
+ "step": 150
220
+ },
221
+ {
222
+ "epoch": 0.6595744680851063,
223
+ "grad_norm": 0.26542067527770996,
224
+ "learning_rate": 0.0008680851063829788,
225
+ "loss": 0.6457,
226
+ "step": 155
227
+ },
228
+ {
229
+ "epoch": 0.6808510638297872,
230
+ "grad_norm": 0.5932815670967102,
231
+ "learning_rate": 0.0008638297872340426,
232
+ "loss": 0.6706,
233
+ "step": 160
234
+ },
235
+ {
236
+ "epoch": 0.7021276595744681,
237
+ "grad_norm": 0.18936298787593842,
238
+ "learning_rate": 0.0008595744680851064,
239
+ "loss": 0.6923,
240
+ "step": 165
241
+ },
242
+ {
243
+ "epoch": 0.723404255319149,
244
+ "grad_norm": 0.2216617614030838,
245
+ "learning_rate": 0.0008553191489361703,
246
+ "loss": 0.6805,
247
+ "step": 170
248
+ },
249
+ {
250
+ "epoch": 0.7446808510638298,
251
+ "grad_norm": 0.2572282552719116,
252
+ "learning_rate": 0.000851063829787234,
253
+ "loss": 0.6803,
254
+ "step": 175
255
+ },
256
+ {
257
+ "epoch": 0.7659574468085106,
258
+ "grad_norm": 0.2624934911727905,
259
+ "learning_rate": 0.0008468085106382979,
260
+ "loss": 0.6796,
261
+ "step": 180
262
+ },
263
+ {
264
+ "epoch": 0.7872340425531915,
265
+ "grad_norm": 0.3983383774757385,
266
+ "learning_rate": 0.0008425531914893617,
267
+ "loss": 0.652,
268
+ "step": 185
269
+ },
270
+ {
271
+ "epoch": 0.8085106382978723,
272
+ "grad_norm": 0.7851768136024475,
273
+ "learning_rate": 0.0008382978723404256,
274
+ "loss": 0.6972,
275
+ "step": 190
276
+ },
277
+ {
278
+ "epoch": 0.8297872340425532,
279
+ "grad_norm": 0.08407687395811081,
280
+ "learning_rate": 0.0008340425531914894,
281
+ "loss": 0.7127,
282
+ "step": 195
283
+ },
284
+ {
285
+ "epoch": 0.851063829787234,
286
+ "grad_norm": 0.2317022830247879,
287
+ "learning_rate": 0.0008297872340425531,
288
+ "loss": 0.6879,
289
+ "step": 200
290
+ },
291
+ {
292
+ "epoch": 0.8723404255319149,
293
+ "grad_norm": 0.10921870172023773,
294
+ "learning_rate": 0.000825531914893617,
295
+ "loss": 0.6909,
296
+ "step": 205
297
+ },
298
+ {
299
+ "epoch": 0.8936170212765957,
300
+ "grad_norm": 0.06697387248277664,
301
+ "learning_rate": 0.0008212765957446808,
302
+ "loss": 0.6858,
303
+ "step": 210
304
+ },
305
+ {
306
+ "epoch": 0.9148936170212766,
307
+ "grad_norm": 0.16396264731884003,
308
+ "learning_rate": 0.0008170212765957447,
309
+ "loss": 0.6836,
310
+ "step": 215
311
+ },
312
+ {
313
+ "epoch": 0.9361702127659575,
314
+ "grad_norm": 0.07334744930267334,
315
+ "learning_rate": 0.0008127659574468085,
316
+ "loss": 0.6835,
317
+ "step": 220
318
+ },
319
+ {
320
+ "epoch": 0.9574468085106383,
321
+ "grad_norm": 0.28075695037841797,
322
+ "learning_rate": 0.0008085106382978723,
323
+ "loss": 0.6616,
324
+ "step": 225
325
+ },
326
+ {
327
+ "epoch": 0.9787234042553191,
328
+ "grad_norm": 0.32385650277137756,
329
+ "learning_rate": 0.0008042553191489363,
330
+ "loss": 0.6763,
331
+ "step": 230
332
+ },
333
+ {
334
+ "epoch": 1.0,
335
+ "grad_norm": 0.6150110960006714,
336
+ "learning_rate": 0.0008,
337
+ "loss": 0.6668,
338
+ "step": 235
339
+ },
340
+ {
341
+ "epoch": 1.0,
342
+ "eval_accuracy": 0.5725333333333333,
343
+ "eval_loss": 0.6652818918228149,
344
+ "eval_runtime": 52.8415,
345
+ "eval_samples_per_second": 70.967,
346
+ "eval_steps_per_second": 1.117,
347
+ "step": 235
348
+ },
349
+ {
350
+ "epoch": 1.0212765957446808,
351
+ "grad_norm": 0.36133354902267456,
352
+ "learning_rate": 0.0007957446808510639,
353
+ "loss": 0.6505,
354
+ "step": 240
355
+ },
356
+ {
357
+ "epoch": 1.0425531914893618,
358
+ "grad_norm": 0.2631653845310211,
359
+ "learning_rate": 0.0007914893617021277,
360
+ "loss": 0.6666,
361
+ "step": 245
362
+ },
363
+ {
364
+ "epoch": 1.0638297872340425,
365
+ "grad_norm": 0.40402382612228394,
366
+ "learning_rate": 0.0007872340425531915,
367
+ "loss": 0.6406,
368
+ "step": 250
369
+ },
370
+ {
371
+ "epoch": 1.0851063829787233,
372
+ "grad_norm": 0.22335675358772278,
373
+ "learning_rate": 0.0007829787234042554,
374
+ "loss": 0.6584,
375
+ "step": 255
376
+ },
377
+ {
378
+ "epoch": 1.1063829787234043,
379
+ "grad_norm": 0.38019102811813354,
380
+ "learning_rate": 0.0007787234042553192,
381
+ "loss": 0.6773,
382
+ "step": 260
383
+ },
384
+ {
385
+ "epoch": 1.127659574468085,
386
+ "grad_norm": 0.6945547461509705,
387
+ "learning_rate": 0.000774468085106383,
388
+ "loss": 0.66,
389
+ "step": 265
390
+ },
391
+ {
392
+ "epoch": 1.148936170212766,
393
+ "grad_norm": 0.2084246724843979,
394
+ "learning_rate": 0.0007702127659574468,
395
+ "loss": 0.6512,
396
+ "step": 270
397
+ },
398
+ {
399
+ "epoch": 1.1702127659574468,
400
+ "grad_norm": 0.1295584738254547,
401
+ "learning_rate": 0.0007659574468085106,
402
+ "loss": 0.6521,
403
+ "step": 275
404
+ },
405
+ {
406
+ "epoch": 1.1914893617021276,
407
+ "grad_norm": 0.12610581517219543,
408
+ "learning_rate": 0.0007617021276595745,
409
+ "loss": 0.6281,
410
+ "step": 280
411
+ },
412
+ {
413
+ "epoch": 1.2127659574468086,
414
+ "grad_norm": 0.5777516961097717,
415
+ "learning_rate": 0.0007574468085106383,
416
+ "loss": 0.6315,
417
+ "step": 285
418
+ },
419
+ {
420
+ "epoch": 1.2340425531914894,
421
+ "grad_norm": 0.4698016047477722,
422
+ "learning_rate": 0.0007531914893617022,
423
+ "loss": 0.6736,
424
+ "step": 290
425
+ },
426
+ {
427
+ "epoch": 1.2553191489361701,
428
+ "grad_norm": 0.306220680475235,
429
+ "learning_rate": 0.0007489361702127659,
430
+ "loss": 0.6616,
431
+ "step": 295
432
+ },
433
+ {
434
+ "epoch": 1.2765957446808511,
435
+ "grad_norm": 0.1651347577571869,
436
+ "learning_rate": 0.0007446808510638298,
437
+ "loss": 0.6624,
438
+ "step": 300
439
+ },
440
+ {
441
+ "epoch": 1.297872340425532,
442
+ "grad_norm": 0.1671248823404312,
443
+ "learning_rate": 0.0007404255319148936,
444
+ "loss": 0.6537,
445
+ "step": 305
446
+ },
447
+ {
448
+ "epoch": 1.3191489361702127,
449
+ "grad_norm": 0.5579215288162231,
450
+ "learning_rate": 0.0007361702127659574,
451
+ "loss": 0.6547,
452
+ "step": 310
453
+ },
454
+ {
455
+ "epoch": 1.3404255319148937,
456
+ "grad_norm": 0.20245681703090668,
457
+ "learning_rate": 0.0007319148936170213,
458
+ "loss": 0.6477,
459
+ "step": 315
460
+ },
461
+ {
462
+ "epoch": 1.3617021276595744,
463
+ "grad_norm": 0.1913478672504425,
464
+ "learning_rate": 0.0007276595744680852,
465
+ "loss": 0.6311,
466
+ "step": 320
467
+ },
468
+ {
469
+ "epoch": 1.3829787234042552,
470
+ "grad_norm": 0.4945693016052246,
471
+ "learning_rate": 0.000723404255319149,
472
+ "loss": 0.5979,
473
+ "step": 325
474
+ },
475
+ {
476
+ "epoch": 1.4042553191489362,
477
+ "grad_norm": 0.1921028196811676,
478
+ "learning_rate": 0.0007191489361702128,
479
+ "loss": 0.7027,
480
+ "step": 330
481
+ },
482
+ {
483
+ "epoch": 1.425531914893617,
484
+ "grad_norm": 0.26029083132743835,
485
+ "learning_rate": 0.0007148936170212766,
486
+ "loss": 0.6733,
487
+ "step": 335
488
+ },
489
+ {
490
+ "epoch": 1.4468085106382977,
491
+ "grad_norm": 0.3045407831668854,
492
+ "learning_rate": 0.0007106382978723405,
493
+ "loss": 0.6619,
494
+ "step": 340
495
+ },
496
+ {
497
+ "epoch": 1.4680851063829787,
498
+ "grad_norm": 0.12488707154989243,
499
+ "learning_rate": 0.0007063829787234043,
500
+ "loss": 0.666,
501
+ "step": 345
502
+ },
503
+ {
504
+ "epoch": 1.4893617021276595,
505
+ "grad_norm": 0.15467241406440735,
506
+ "learning_rate": 0.0007021276595744682,
507
+ "loss": 0.634,
508
+ "step": 350
509
+ },
510
+ {
511
+ "epoch": 1.5106382978723403,
512
+ "grad_norm": 0.23499886691570282,
513
+ "learning_rate": 0.0006978723404255319,
514
+ "loss": 0.6257,
515
+ "step": 355
516
+ },
517
+ {
518
+ "epoch": 1.5319148936170213,
519
+ "grad_norm": 0.48748576641082764,
520
+ "learning_rate": 0.0006936170212765957,
521
+ "loss": 0.6369,
522
+ "step": 360
523
+ },
524
+ {
525
+ "epoch": 1.5531914893617023,
526
+ "grad_norm": 0.3014831244945526,
527
+ "learning_rate": 0.0006893617021276596,
528
+ "loss": 0.6274,
529
+ "step": 365
530
+ },
531
+ {
532
+ "epoch": 1.574468085106383,
533
+ "grad_norm": 0.12689495086669922,
534
+ "learning_rate": 0.0006851063829787234,
535
+ "loss": 0.6427,
536
+ "step": 370
537
+ },
538
+ {
539
+ "epoch": 1.5957446808510638,
540
+ "grad_norm": 0.3490160405635834,
541
+ "learning_rate": 0.0006808510638297873,
542
+ "loss": 0.6885,
543
+ "step": 375
544
+ },
545
+ {
546
+ "epoch": 1.6170212765957448,
547
+ "grad_norm": 0.2676607370376587,
548
+ "learning_rate": 0.000676595744680851,
549
+ "loss": 0.6436,
550
+ "step": 380
551
+ },
552
+ {
553
+ "epoch": 1.6382978723404256,
554
+ "grad_norm": 0.26951488852500916,
555
+ "learning_rate": 0.0006723404255319148,
556
+ "loss": 0.6387,
557
+ "step": 385
558
+ },
559
+ {
560
+ "epoch": 1.6595744680851063,
561
+ "grad_norm": 0.3769073784351349,
562
+ "learning_rate": 0.0006680851063829787,
563
+ "loss": 0.6003,
564
+ "step": 390
565
+ },
566
+ {
567
+ "epoch": 1.6808510638297873,
568
+ "grad_norm": 0.43915122747421265,
569
+ "learning_rate": 0.0006638297872340425,
570
+ "loss": 0.6477,
571
+ "step": 395
572
+ },
573
+ {
574
+ "epoch": 1.702127659574468,
575
+ "grad_norm": 0.2419726401567459,
576
+ "learning_rate": 0.0006595744680851064,
577
+ "loss": 0.6174,
578
+ "step": 400
579
+ },
580
+ {
581
+ "epoch": 1.7234042553191489,
582
+ "grad_norm": 0.5210821628570557,
583
+ "learning_rate": 0.0006553191489361702,
584
+ "loss": 0.625,
585
+ "step": 405
586
+ },
587
+ {
588
+ "epoch": 1.7446808510638299,
589
+ "grad_norm": 0.5546556115150452,
590
+ "learning_rate": 0.0006510638297872342,
591
+ "loss": 0.604,
592
+ "step": 410
593
+ },
594
+ {
595
+ "epoch": 1.7659574468085106,
596
+ "grad_norm": 0.5459072589874268,
597
+ "learning_rate": 0.0006468085106382979,
598
+ "loss": 0.6322,
599
+ "step": 415
600
+ },
601
+ {
602
+ "epoch": 1.7872340425531914,
603
+ "grad_norm": 0.28615137934684753,
604
+ "learning_rate": 0.0006425531914893617,
605
+ "loss": 0.6288,
606
+ "step": 420
607
+ },
608
+ {
609
+ "epoch": 1.8085106382978724,
610
+ "grad_norm": 0.25826430320739746,
611
+ "learning_rate": 0.0006382978723404256,
612
+ "loss": 0.6377,
613
+ "step": 425
614
+ },
615
+ {
616
+ "epoch": 1.8297872340425532,
617
+ "grad_norm": 0.27113598585128784,
618
+ "learning_rate": 0.0006340425531914894,
619
+ "loss": 0.6155,
620
+ "step": 430
621
+ },
622
+ {
623
+ "epoch": 1.851063829787234,
624
+ "grad_norm": 0.3145448565483093,
625
+ "learning_rate": 0.0006297872340425533,
626
+ "loss": 0.6258,
627
+ "step": 435
628
+ },
629
+ {
630
+ "epoch": 1.872340425531915,
631
+ "grad_norm": 0.221902996301651,
632
+ "learning_rate": 0.000625531914893617,
633
+ "loss": 0.6133,
634
+ "step": 440
635
+ },
636
+ {
637
+ "epoch": 1.8936170212765957,
638
+ "grad_norm": 0.2308581918478012,
639
+ "learning_rate": 0.0006212765957446808,
640
+ "loss": 0.5883,
641
+ "step": 445
642
+ },
643
+ {
644
+ "epoch": 1.9148936170212765,
645
+ "grad_norm": 0.2169838696718216,
646
+ "learning_rate": 0.0006170212765957447,
647
+ "loss": 0.6219,
648
+ "step": 450
649
+ },
650
+ {
651
+ "epoch": 1.9361702127659575,
652
+ "grad_norm": 0.32386860251426697,
653
+ "learning_rate": 0.0006127659574468085,
654
+ "loss": 0.6102,
655
+ "step": 455
656
+ },
657
+ {
658
+ "epoch": 1.9574468085106385,
659
+ "grad_norm": 0.13700896501541138,
660
+ "learning_rate": 0.0006085106382978724,
661
+ "loss": 0.6436,
662
+ "step": 460
663
+ },
664
+ {
665
+ "epoch": 1.978723404255319,
666
+ "grad_norm": 0.18552586436271667,
667
+ "learning_rate": 0.0006042553191489362,
668
+ "loss": 0.6524,
669
+ "step": 465
670
+ },
671
+ {
672
+ "epoch": 2.0,
673
+ "grad_norm": 0.5744425058364868,
674
+ "learning_rate": 0.0006,
675
+ "loss": 0.6527,
676
+ "step": 470
677
+ },
678
+ {
679
+ "epoch": 2.0,
680
+ "eval_accuracy": 0.6528,
681
+ "eval_loss": 0.6233171224594116,
682
+ "eval_runtime": 52.0761,
683
+ "eval_samples_per_second": 72.01,
684
+ "eval_steps_per_second": 1.133,
685
+ "step": 470
686
+ },
687
+ {
688
+ "epoch": 2.021276595744681,
689
+ "grad_norm": 0.39053860306739807,
690
+ "learning_rate": 0.0005957446808510638,
691
+ "loss": 0.5832,
692
+ "step": 475
693
+ },
694
+ {
695
+ "epoch": 2.0425531914893615,
696
+ "grad_norm": 0.2939192056655884,
697
+ "learning_rate": 0.0005914893617021276,
698
+ "loss": 0.5808,
699
+ "step": 480
700
+ },
701
+ {
702
+ "epoch": 2.0638297872340425,
703
+ "grad_norm": 0.5998929142951965,
704
+ "learning_rate": 0.0005872340425531915,
705
+ "loss": 0.6119,
706
+ "step": 485
707
+ },
708
+ {
709
+ "epoch": 2.0851063829787235,
710
+ "grad_norm": 0.48165130615234375,
711
+ "learning_rate": 0.0005829787234042553,
712
+ "loss": 0.5868,
713
+ "step": 490
714
+ },
715
+ {
716
+ "epoch": 2.106382978723404,
717
+ "grad_norm": 0.2857578694820404,
718
+ "learning_rate": 0.0005787234042553191,
719
+ "loss": 0.5843,
720
+ "step": 495
721
+ },
722
+ {
723
+ "epoch": 2.127659574468085,
724
+ "grad_norm": 0.28461429476737976,
725
+ "learning_rate": 0.0005744680851063831,
726
+ "loss": 0.5843,
727
+ "step": 500
728
+ },
729
+ {
730
+ "epoch": 2.148936170212766,
731
+ "grad_norm": 0.30877211689949036,
732
+ "learning_rate": 0.0005702127659574468,
733
+ "loss": 0.5652,
734
+ "step": 505
735
+ },
736
+ {
737
+ "epoch": 2.1702127659574466,
738
+ "grad_norm": 0.7491441369056702,
739
+ "learning_rate": 0.0005659574468085107,
740
+ "loss": 0.5687,
741
+ "step": 510
742
+ },
743
+ {
744
+ "epoch": 2.1914893617021276,
745
+ "grad_norm": 0.29466772079467773,
746
+ "learning_rate": 0.0005617021276595745,
747
+ "loss": 0.6339,
748
+ "step": 515
749
+ },
750
+ {
751
+ "epoch": 2.2127659574468086,
752
+ "grad_norm": 0.44021138548851013,
753
+ "learning_rate": 0.0005574468085106383,
754
+ "loss": 0.5629,
755
+ "step": 520
756
+ },
757
+ {
758
+ "epoch": 2.2340425531914896,
759
+ "grad_norm": 0.19135086238384247,
760
+ "learning_rate": 0.0005531914893617022,
761
+ "loss": 0.6169,
762
+ "step": 525
763
+ },
764
+ {
765
+ "epoch": 2.25531914893617,
766
+ "grad_norm": 0.6730530858039856,
767
+ "learning_rate": 0.000548936170212766,
768
+ "loss": 0.6063,
769
+ "step": 530
770
+ },
771
+ {
772
+ "epoch": 2.276595744680851,
773
+ "grad_norm": 0.4451698362827301,
774
+ "learning_rate": 0.0005446808510638298,
775
+ "loss": 0.614,
776
+ "step": 535
777
+ },
778
+ {
779
+ "epoch": 2.297872340425532,
780
+ "grad_norm": 0.19956566393375397,
781
+ "learning_rate": 0.0005404255319148936,
782
+ "loss": 0.5848,
783
+ "step": 540
784
+ },
785
+ {
786
+ "epoch": 2.3191489361702127,
787
+ "grad_norm": 0.3573627471923828,
788
+ "learning_rate": 0.0005361702127659575,
789
+ "loss": 0.5963,
790
+ "step": 545
791
+ },
792
+ {
793
+ "epoch": 2.3404255319148937,
794
+ "grad_norm": 0.22617582976818085,
795
+ "learning_rate": 0.0005319148936170213,
796
+ "loss": 0.5512,
797
+ "step": 550
798
+ },
799
+ {
800
+ "epoch": 2.3617021276595747,
801
+ "grad_norm": 0.2276870310306549,
802
+ "learning_rate": 0.0005276595744680851,
803
+ "loss": 0.5801,
804
+ "step": 555
805
+ },
806
+ {
807
+ "epoch": 2.382978723404255,
808
+ "grad_norm": 0.3912278413772583,
809
+ "learning_rate": 0.000523404255319149,
810
+ "loss": 0.6101,
811
+ "step": 560
812
+ },
813
+ {
814
+ "epoch": 2.404255319148936,
815
+ "grad_norm": 0.20038598775863647,
816
+ "learning_rate": 0.0005191489361702127,
817
+ "loss": 0.5842,
818
+ "step": 565
819
+ },
820
+ {
821
+ "epoch": 2.425531914893617,
822
+ "grad_norm": 0.27847474813461304,
823
+ "learning_rate": 0.0005148936170212766,
824
+ "loss": 0.5597,
825
+ "step": 570
826
+ },
827
+ {
828
+ "epoch": 2.4468085106382977,
829
+ "grad_norm": 0.49357470870018005,
830
+ "learning_rate": 0.0005106382978723404,
831
+ "loss": 0.5374,
832
+ "step": 575
833
+ },
834
+ {
835
+ "epoch": 2.4680851063829787,
836
+ "grad_norm": 0.22584182024002075,
837
+ "learning_rate": 0.0005063829787234042,
838
+ "loss": 0.6416,
839
+ "step": 580
840
+ },
841
+ {
842
+ "epoch": 2.4893617021276597,
843
+ "grad_norm": 0.4970340430736542,
844
+ "learning_rate": 0.0005021276595744681,
845
+ "loss": 0.6101,
846
+ "step": 585
847
+ },
848
+ {
849
+ "epoch": 2.5106382978723403,
850
+ "grad_norm": 0.23562884330749512,
851
+ "learning_rate": 0.000497872340425532,
852
+ "loss": 0.5728,
853
+ "step": 590
854
+ },
855
+ {
856
+ "epoch": 2.5319148936170213,
857
+ "grad_norm": 0.2772935926914215,
858
+ "learning_rate": 0.0004936170212765957,
859
+ "loss": 0.5969,
860
+ "step": 595
861
+ },
862
+ {
863
+ "epoch": 2.5531914893617023,
864
+ "grad_norm": 0.466553658246994,
865
+ "learning_rate": 0.0004893617021276596,
866
+ "loss": 0.5722,
867
+ "step": 600
868
+ },
869
+ {
870
+ "epoch": 2.574468085106383,
871
+ "grad_norm": 0.1931866854429245,
872
+ "learning_rate": 0.0004851063829787234,
873
+ "loss": 0.5947,
874
+ "step": 605
875
+ },
876
+ {
877
+ "epoch": 2.595744680851064,
878
+ "grad_norm": 0.3345823884010315,
879
+ "learning_rate": 0.00048085106382978723,
880
+ "loss": 0.5464,
881
+ "step": 610
882
+ },
883
+ {
884
+ "epoch": 2.617021276595745,
885
+ "grad_norm": 0.8605038523674011,
886
+ "learning_rate": 0.0004765957446808511,
887
+ "loss": 0.616,
888
+ "step": 615
889
+ },
890
+ {
891
+ "epoch": 2.6382978723404253,
892
+ "grad_norm": 0.467629611492157,
893
+ "learning_rate": 0.0004723404255319149,
894
+ "loss": 0.5997,
895
+ "step": 620
896
+ },
897
+ {
898
+ "epoch": 2.6595744680851063,
899
+ "grad_norm": 0.30429497361183167,
900
+ "learning_rate": 0.00046808510638297874,
901
+ "loss": 0.5498,
902
+ "step": 625
903
+ },
904
+ {
905
+ "epoch": 2.6808510638297873,
906
+ "grad_norm": 0.2898688316345215,
907
+ "learning_rate": 0.00046382978723404257,
908
+ "loss": 0.5526,
909
+ "step": 630
910
+ },
911
+ {
912
+ "epoch": 2.702127659574468,
913
+ "grad_norm": 0.24966174364089966,
914
+ "learning_rate": 0.0004595744680851064,
915
+ "loss": 0.568,
916
+ "step": 635
917
+ },
918
+ {
919
+ "epoch": 2.723404255319149,
920
+ "grad_norm": 0.31960707902908325,
921
+ "learning_rate": 0.00045531914893617024,
922
+ "loss": 0.5573,
923
+ "step": 640
924
+ },
925
+ {
926
+ "epoch": 2.74468085106383,
927
+ "grad_norm": 0.17629045248031616,
928
+ "learning_rate": 0.000451063829787234,
929
+ "loss": 0.5793,
930
+ "step": 645
931
+ },
932
+ {
933
+ "epoch": 2.7659574468085104,
934
+ "grad_norm": 0.3344897925853729,
935
+ "learning_rate": 0.00044680851063829785,
936
+ "loss": 0.5782,
937
+ "step": 650
938
+ },
939
+ {
940
+ "epoch": 2.7872340425531914,
941
+ "grad_norm": 0.6426132917404175,
942
+ "learning_rate": 0.0004425531914893617,
943
+ "loss": 0.6065,
944
+ "step": 655
945
+ },
946
+ {
947
+ "epoch": 2.8085106382978724,
948
+ "grad_norm": 0.4149859547615051,
949
+ "learning_rate": 0.00043829787234042557,
950
+ "loss": 0.6095,
951
+ "step": 660
952
+ },
953
+ {
954
+ "epoch": 2.829787234042553,
955
+ "grad_norm": 0.2638397812843323,
956
+ "learning_rate": 0.0004340425531914894,
957
+ "loss": 0.5651,
958
+ "step": 665
959
+ },
960
+ {
961
+ "epoch": 2.851063829787234,
962
+ "grad_norm": 0.47826263308525085,
963
+ "learning_rate": 0.0004297872340425532,
964
+ "loss": 0.6366,
965
+ "step": 670
966
+ },
967
+ {
968
+ "epoch": 2.872340425531915,
969
+ "grad_norm": 0.47488388419151306,
970
+ "learning_rate": 0.000425531914893617,
971
+ "loss": 0.5498,
972
+ "step": 675
973
+ },
974
+ {
975
+ "epoch": 2.8936170212765955,
976
+ "grad_norm": 0.29856908321380615,
977
+ "learning_rate": 0.00042127659574468085,
978
+ "loss": 0.5576,
979
+ "step": 680
980
+ },
981
+ {
982
+ "epoch": 2.9148936170212765,
983
+ "grad_norm": 0.3228590488433838,
984
+ "learning_rate": 0.0004170212765957447,
985
+ "loss": 0.5527,
986
+ "step": 685
987
+ },
988
+ {
989
+ "epoch": 2.9361702127659575,
990
+ "grad_norm": 0.28109100461006165,
991
+ "learning_rate": 0.0004127659574468085,
992
+ "loss": 0.5421,
993
+ "step": 690
994
+ },
995
+ {
996
+ "epoch": 2.9574468085106385,
997
+ "grad_norm": 0.43624716997146606,
998
+ "learning_rate": 0.00040851063829787235,
999
+ "loss": 0.5419,
1000
+ "step": 695
1001
+ },
1002
+ {
1003
+ "epoch": 2.978723404255319,
1004
+ "grad_norm": 0.33003830909729004,
1005
+ "learning_rate": 0.00040425531914893613,
1006
+ "loss": 0.5614,
1007
+ "step": 700
1008
+ },
1009
+ {
1010
+ "epoch": 3.0,
1011
+ "grad_norm": 0.8071190118789673,
1012
+ "learning_rate": 0.0004,
1013
+ "loss": 0.5628,
1014
+ "step": 705
1015
+ },
1016
+ {
1017
+ "epoch": 3.0,
1018
+ "eval_accuracy": 0.7048,
1019
+ "eval_loss": 0.5658010244369507,
1020
+ "eval_runtime": 52.264,
1021
+ "eval_samples_per_second": 71.751,
1022
+ "eval_steps_per_second": 1.129,
1023
+ "step": 705
1024
+ },
1025
+ {
1026
+ "epoch": 3.021276595744681,
1027
+ "grad_norm": 0.34832167625427246,
1028
+ "learning_rate": 0.00039574468085106385,
1029
+ "loss": 0.5412,
1030
+ "step": 710
1031
+ },
1032
+ {
1033
+ "epoch": 3.0425531914893615,
1034
+ "grad_norm": 0.3105883002281189,
1035
+ "learning_rate": 0.0003914893617021277,
1036
+ "loss": 0.5428,
1037
+ "step": 715
1038
+ },
1039
+ {
1040
+ "epoch": 3.0638297872340425,
1041
+ "grad_norm": 0.48978525400161743,
1042
+ "learning_rate": 0.0003872340425531915,
1043
+ "loss": 0.5074,
1044
+ "step": 720
1045
+ },
1046
+ {
1047
+ "epoch": 3.0851063829787235,
1048
+ "grad_norm": 0.3323807120323181,
1049
+ "learning_rate": 0.0003829787234042553,
1050
+ "loss": 0.5388,
1051
+ "step": 725
1052
+ },
1053
+ {
1054
+ "epoch": 3.106382978723404,
1055
+ "grad_norm": 0.23931725323200226,
1056
+ "learning_rate": 0.00037872340425531913,
1057
+ "loss": 0.5329,
1058
+ "step": 730
1059
+ },
1060
+ {
1061
+ "epoch": 3.127659574468085,
1062
+ "grad_norm": 0.4094422459602356,
1063
+ "learning_rate": 0.00037446808510638297,
1064
+ "loss": 0.5256,
1065
+ "step": 735
1066
+ },
1067
+ {
1068
+ "epoch": 3.148936170212766,
1069
+ "grad_norm": 0.2427910566329956,
1070
+ "learning_rate": 0.0003702127659574468,
1071
+ "loss": 0.4994,
1072
+ "step": 740
1073
+ },
1074
+ {
1075
+ "epoch": 3.1702127659574466,
1076
+ "grad_norm": 0.46753978729248047,
1077
+ "learning_rate": 0.00036595744680851063,
1078
+ "loss": 0.5841,
1079
+ "step": 745
1080
+ },
1081
+ {
1082
+ "epoch": 3.1914893617021276,
1083
+ "grad_norm": 0.60309898853302,
1084
+ "learning_rate": 0.0003617021276595745,
1085
+ "loss": 0.5018,
1086
+ "step": 750
1087
+ },
1088
+ {
1089
+ "epoch": 3.2127659574468086,
1090
+ "grad_norm": 0.32367798686027527,
1091
+ "learning_rate": 0.0003574468085106383,
1092
+ "loss": 0.5112,
1093
+ "step": 755
1094
+ },
1095
+ {
1096
+ "epoch": 3.2340425531914896,
1097
+ "grad_norm": 0.31850096583366394,
1098
+ "learning_rate": 0.00035319148936170213,
1099
+ "loss": 0.5197,
1100
+ "step": 760
1101
+ },
1102
+ {
1103
+ "epoch": 3.25531914893617,
1104
+ "grad_norm": 0.40993842482566833,
1105
+ "learning_rate": 0.00034893617021276597,
1106
+ "loss": 0.491,
1107
+ "step": 765
1108
+ },
1109
+ {
1110
+ "epoch": 3.276595744680851,
1111
+ "grad_norm": 0.31502920389175415,
1112
+ "learning_rate": 0.0003446808510638298,
1113
+ "loss": 0.5134,
1114
+ "step": 770
1115
+ },
1116
+ {
1117
+ "epoch": 3.297872340425532,
1118
+ "grad_norm": 0.34986236691474915,
1119
+ "learning_rate": 0.00034042553191489364,
1120
+ "loss": 0.5093,
1121
+ "step": 775
1122
+ },
1123
+ {
1124
+ "epoch": 3.3191489361702127,
1125
+ "grad_norm": 0.30818113684654236,
1126
+ "learning_rate": 0.0003361702127659574,
1127
+ "loss": 0.4668,
1128
+ "step": 780
1129
+ },
1130
+ {
1131
+ "epoch": 3.3404255319148937,
1132
+ "grad_norm": 0.45690372586250305,
1133
+ "learning_rate": 0.00033191489361702125,
1134
+ "loss": 0.4793,
1135
+ "step": 785
1136
+ },
1137
+ {
1138
+ "epoch": 3.3617021276595747,
1139
+ "grad_norm": 0.431671142578125,
1140
+ "learning_rate": 0.0003276595744680851,
1141
+ "loss": 0.5449,
1142
+ "step": 790
1143
+ },
1144
+ {
1145
+ "epoch": 3.382978723404255,
1146
+ "grad_norm": 0.6079233288764954,
1147
+ "learning_rate": 0.00032340425531914897,
1148
+ "loss": 0.5055,
1149
+ "step": 795
1150
+ },
1151
+ {
1152
+ "epoch": 3.404255319148936,
1153
+ "grad_norm": 0.25394123792648315,
1154
+ "learning_rate": 0.0003191489361702128,
1155
+ "loss": 0.5137,
1156
+ "step": 800
1157
+ },
1158
+ {
1159
+ "epoch": 3.425531914893617,
1160
+ "grad_norm": 0.2768719494342804,
1161
+ "learning_rate": 0.00031489361702127664,
1162
+ "loss": 0.5378,
1163
+ "step": 805
1164
+ },
1165
+ {
1166
+ "epoch": 3.4468085106382977,
1167
+ "grad_norm": 0.33412039279937744,
1168
+ "learning_rate": 0.0003106382978723404,
1169
+ "loss": 0.5529,
1170
+ "step": 810
1171
+ },
1172
+ {
1173
+ "epoch": 3.4680851063829787,
1174
+ "grad_norm": 0.45218709111213684,
1175
+ "learning_rate": 0.00030638297872340425,
1176
+ "loss": 0.514,
1177
+ "step": 815
1178
+ },
1179
+ {
1180
+ "epoch": 3.4893617021276597,
1181
+ "grad_norm": 0.29416921734809875,
1182
+ "learning_rate": 0.0003021276595744681,
1183
+ "loss": 0.471,
1184
+ "step": 820
1185
+ },
1186
+ {
1187
+ "epoch": 3.5106382978723403,
1188
+ "grad_norm": 0.4108869433403015,
1189
+ "learning_rate": 0.0002978723404255319,
1190
+ "loss": 0.5222,
1191
+ "step": 825
1192
+ },
1193
+ {
1194
+ "epoch": 3.5319148936170213,
1195
+ "grad_norm": 0.5049691200256348,
1196
+ "learning_rate": 0.00029361702127659575,
1197
+ "loss": 0.5103,
1198
+ "step": 830
1199
+ },
1200
+ {
1201
+ "epoch": 3.5531914893617023,
1202
+ "grad_norm": 0.37521079182624817,
1203
+ "learning_rate": 0.00028936170212765953,
1204
+ "loss": 0.5088,
1205
+ "step": 835
1206
+ },
1207
+ {
1208
+ "epoch": 3.574468085106383,
1209
+ "grad_norm": 0.6042494177818298,
1210
+ "learning_rate": 0.0002851063829787234,
1211
+ "loss": 0.4886,
1212
+ "step": 840
1213
+ },
1214
+ {
1215
+ "epoch": 3.595744680851064,
1216
+ "grad_norm": 0.3379281163215637,
1217
+ "learning_rate": 0.00028085106382978725,
1218
+ "loss": 0.4878,
1219
+ "step": 845
1220
+ },
1221
+ {
1222
+ "epoch": 3.617021276595745,
1223
+ "grad_norm": 0.42538291215896606,
1224
+ "learning_rate": 0.0002765957446808511,
1225
+ "loss": 0.5241,
1226
+ "step": 850
1227
+ },
1228
+ {
1229
+ "epoch": 3.6382978723404253,
1230
+ "grad_norm": 0.34973302483558655,
1231
+ "learning_rate": 0.0002723404255319149,
1232
+ "loss": 0.497,
1233
+ "step": 855
1234
+ },
1235
+ {
1236
+ "epoch": 3.6595744680851063,
1237
+ "grad_norm": 0.5937588214874268,
1238
+ "learning_rate": 0.00026808510638297875,
1239
+ "loss": 0.5004,
1240
+ "step": 860
1241
+ },
1242
+ {
1243
+ "epoch": 3.6808510638297873,
1244
+ "grad_norm": 0.3566235601902008,
1245
+ "learning_rate": 0.00026382978723404253,
1246
+ "loss": 0.5192,
1247
+ "step": 865
1248
+ },
1249
+ {
1250
+ "epoch": 3.702127659574468,
1251
+ "grad_norm": 0.7297813296318054,
1252
+ "learning_rate": 0.00025957446808510637,
1253
+ "loss": 0.5313,
1254
+ "step": 870
1255
+ },
1256
+ {
1257
+ "epoch": 3.723404255319149,
1258
+ "grad_norm": 0.3060586452484131,
1259
+ "learning_rate": 0.0002553191489361702,
1260
+ "loss": 0.5057,
1261
+ "step": 875
1262
+ },
1263
+ {
1264
+ "epoch": 3.74468085106383,
1265
+ "grad_norm": 0.3572905361652374,
1266
+ "learning_rate": 0.00025106382978723403,
1267
+ "loss": 0.5078,
1268
+ "step": 880
1269
+ },
1270
+ {
1271
+ "epoch": 3.7659574468085104,
1272
+ "grad_norm": 0.5359181761741638,
1273
+ "learning_rate": 0.00024680851063829787,
1274
+ "loss": 0.4953,
1275
+ "step": 885
1276
+ },
1277
+ {
1278
+ "epoch": 3.7872340425531914,
1279
+ "grad_norm": 0.676404595375061,
1280
+ "learning_rate": 0.0002425531914893617,
1281
+ "loss": 0.4878,
1282
+ "step": 890
1283
+ },
1284
+ {
1285
+ "epoch": 3.8085106382978724,
1286
+ "grad_norm": 0.7736416459083557,
1287
+ "learning_rate": 0.00023829787234042556,
1288
+ "loss": 0.4897,
1289
+ "step": 895
1290
+ },
1291
+ {
1292
+ "epoch": 3.829787234042553,
1293
+ "grad_norm": 0.6416388154029846,
1294
+ "learning_rate": 0.00023404255319148937,
1295
+ "loss": 0.5031,
1296
+ "step": 900
1297
+ },
1298
+ {
1299
+ "epoch": 3.851063829787234,
1300
+ "grad_norm": 1.1011937856674194,
1301
+ "learning_rate": 0.0002297872340425532,
1302
+ "loss": 0.4563,
1303
+ "step": 905
1304
+ },
1305
+ {
1306
+ "epoch": 3.872340425531915,
1307
+ "grad_norm": 0.4412100613117218,
1308
+ "learning_rate": 0.000225531914893617,
1309
+ "loss": 0.525,
1310
+ "step": 910
1311
+ },
1312
+ {
1313
+ "epoch": 3.8936170212765955,
1314
+ "grad_norm": 0.6614885926246643,
1315
+ "learning_rate": 0.00022127659574468084,
1316
+ "loss": 0.5163,
1317
+ "step": 915
1318
+ },
1319
+ {
1320
+ "epoch": 3.9148936170212765,
1321
+ "grad_norm": 0.38106369972229004,
1322
+ "learning_rate": 0.0002170212765957447,
1323
+ "loss": 0.5182,
1324
+ "step": 920
1325
+ },
1326
+ {
1327
+ "epoch": 3.9361702127659575,
1328
+ "grad_norm": 0.44818058609962463,
1329
+ "learning_rate": 0.0002127659574468085,
1330
+ "loss": 0.4875,
1331
+ "step": 925
1332
+ },
1333
+ {
1334
+ "epoch": 3.9574468085106385,
1335
+ "grad_norm": 0.3101024925708771,
1336
+ "learning_rate": 0.00020851063829787234,
1337
+ "loss": 0.5498,
1338
+ "step": 930
1339
+ },
1340
+ {
1341
+ "epoch": 3.978723404255319,
1342
+ "grad_norm": 0.4035079777240753,
1343
+ "learning_rate": 0.00020425531914893618,
1344
+ "loss": 0.5139,
1345
+ "step": 935
1346
+ },
1347
+ {
1348
+ "epoch": 4.0,
1349
+ "grad_norm": 0.4626338481903076,
1350
+ "learning_rate": 0.0002,
1351
+ "loss": 0.4683,
1352
+ "step": 940
1353
+ },
1354
+ {
1355
+ "epoch": 4.0,
1356
+ "eval_accuracy": 0.7290666666666666,
1357
+ "eval_loss": 0.5313977003097534,
1358
+ "eval_runtime": 52.259,
1359
+ "eval_samples_per_second": 71.758,
1360
+ "eval_steps_per_second": 1.129,
1361
+ "step": 940
1362
+ },
1363
+ {
1364
+ "epoch": 4.0212765957446805,
1365
+ "grad_norm": 0.3656058609485626,
1366
+ "learning_rate": 0.00019574468085106384,
1367
+ "loss": 0.4576,
1368
+ "step": 945
1369
+ },
1370
+ {
1371
+ "epoch": 4.042553191489362,
1372
+ "grad_norm": 0.6442248225212097,
1373
+ "learning_rate": 0.00019148936170212765,
1374
+ "loss": 0.4869,
1375
+ "step": 950
1376
+ },
1377
+ {
1378
+ "epoch": 4.0638297872340425,
1379
+ "grad_norm": 0.8725343942642212,
1380
+ "learning_rate": 0.00018723404255319148,
1381
+ "loss": 0.4081,
1382
+ "step": 955
1383
+ },
1384
+ {
1385
+ "epoch": 4.085106382978723,
1386
+ "grad_norm": 0.5488789677619934,
1387
+ "learning_rate": 0.00018297872340425532,
1388
+ "loss": 0.3774,
1389
+ "step": 960
1390
+ },
1391
+ {
1392
+ "epoch": 4.1063829787234045,
1393
+ "grad_norm": 0.45871075987815857,
1394
+ "learning_rate": 0.00017872340425531915,
1395
+ "loss": 0.3895,
1396
+ "step": 965
1397
+ },
1398
+ {
1399
+ "epoch": 4.127659574468085,
1400
+ "grad_norm": 0.7183250784873962,
1401
+ "learning_rate": 0.00017446808510638298,
1402
+ "loss": 0.4216,
1403
+ "step": 970
1404
+ },
1405
+ {
1406
+ "epoch": 4.148936170212766,
1407
+ "grad_norm": 0.43252503871917725,
1408
+ "learning_rate": 0.00017021276595744682,
1409
+ "loss": 0.4306,
1410
+ "step": 975
1411
+ },
1412
+ {
1413
+ "epoch": 4.170212765957447,
1414
+ "grad_norm": 0.5714681148529053,
1415
+ "learning_rate": 0.00016595744680851062,
1416
+ "loss": 0.4607,
1417
+ "step": 980
1418
+ },
1419
+ {
1420
+ "epoch": 4.191489361702128,
1421
+ "grad_norm": 0.5099291801452637,
1422
+ "learning_rate": 0.00016170212765957449,
1423
+ "loss": 0.372,
1424
+ "step": 985
1425
+ },
1426
+ {
1427
+ "epoch": 4.212765957446808,
1428
+ "grad_norm": 0.5010551810264587,
1429
+ "learning_rate": 0.00015744680851063832,
1430
+ "loss": 0.4414,
1431
+ "step": 990
1432
+ },
1433
+ {
1434
+ "epoch": 4.23404255319149,
1435
+ "grad_norm": 0.6585486531257629,
1436
+ "learning_rate": 0.00015319148936170213,
1437
+ "loss": 0.4191,
1438
+ "step": 995
1439
+ },
1440
+ {
1441
+ "epoch": 4.25531914893617,
1442
+ "grad_norm": 0.5043871402740479,
1443
+ "learning_rate": 0.00014893617021276596,
1444
+ "loss": 0.4273,
1445
+ "step": 1000
1446
+ },
1447
+ {
1448
+ "epoch": 4.276595744680851,
1449
+ "grad_norm": 0.4368508756160736,
1450
+ "learning_rate": 0.00014468085106382977,
1451
+ "loss": 0.4329,
1452
+ "step": 1005
1453
+ },
1454
+ {
1455
+ "epoch": 4.297872340425532,
1456
+ "grad_norm": 0.5174155235290527,
1457
+ "learning_rate": 0.00014042553191489363,
1458
+ "loss": 0.4256,
1459
+ "step": 1010
1460
+ },
1461
+ {
1462
+ "epoch": 4.319148936170213,
1463
+ "grad_norm": 0.7088821530342102,
1464
+ "learning_rate": 0.00013617021276595746,
1465
+ "loss": 0.4025,
1466
+ "step": 1015
1467
+ },
1468
+ {
1469
+ "epoch": 4.340425531914893,
1470
+ "grad_norm": 0.41731932759284973,
1471
+ "learning_rate": 0.00013191489361702127,
1472
+ "loss": 0.4018,
1473
+ "step": 1020
1474
+ },
1475
+ {
1476
+ "epoch": 4.361702127659575,
1477
+ "grad_norm": 0.47780218720436096,
1478
+ "learning_rate": 0.0001276595744680851,
1479
+ "loss": 0.4683,
1480
+ "step": 1025
1481
+ },
1482
+ {
1483
+ "epoch": 4.382978723404255,
1484
+ "grad_norm": 0.49915027618408203,
1485
+ "learning_rate": 0.00012340425531914893,
1486
+ "loss": 0.4643,
1487
+ "step": 1030
1488
+ },
1489
+ {
1490
+ "epoch": 4.404255319148936,
1491
+ "grad_norm": 0.5682059526443481,
1492
+ "learning_rate": 0.00011914893617021278,
1493
+ "loss": 0.4259,
1494
+ "step": 1035
1495
+ },
1496
+ {
1497
+ "epoch": 4.425531914893617,
1498
+ "grad_norm": 0.36220914125442505,
1499
+ "learning_rate": 0.0001148936170212766,
1500
+ "loss": 0.4116,
1501
+ "step": 1040
1502
+ },
1503
+ {
1504
+ "epoch": 4.446808510638298,
1505
+ "grad_norm": 0.5478158593177795,
1506
+ "learning_rate": 0.00011063829787234042,
1507
+ "loss": 0.4299,
1508
+ "step": 1045
1509
+ },
1510
+ {
1511
+ "epoch": 4.468085106382979,
1512
+ "grad_norm": 0.5897641181945801,
1513
+ "learning_rate": 0.00010638297872340425,
1514
+ "loss": 0.3619,
1515
+ "step": 1050
1516
+ },
1517
+ {
1518
+ "epoch": 4.48936170212766,
1519
+ "grad_norm": 1.084243893623352,
1520
+ "learning_rate": 0.00010212765957446809,
1521
+ "loss": 0.4211,
1522
+ "step": 1055
1523
+ },
1524
+ {
1525
+ "epoch": 4.51063829787234,
1526
+ "grad_norm": 0.7980880737304688,
1527
+ "learning_rate": 9.787234042553192e-05,
1528
+ "loss": 0.4053,
1529
+ "step": 1060
1530
+ },
1531
+ {
1532
+ "epoch": 4.531914893617021,
1533
+ "grad_norm": 0.9330500364303589,
1534
+ "learning_rate": 9.361702127659574e-05,
1535
+ "loss": 0.4183,
1536
+ "step": 1065
1537
+ },
1538
+ {
1539
+ "epoch": 4.553191489361702,
1540
+ "grad_norm": 0.40023094415664673,
1541
+ "learning_rate": 8.936170212765958e-05,
1542
+ "loss": 0.4343,
1543
+ "step": 1070
1544
+ },
1545
+ {
1546
+ "epoch": 4.574468085106383,
1547
+ "grad_norm": 0.6411470770835876,
1548
+ "learning_rate": 8.510638297872341e-05,
1549
+ "loss": 0.4096,
1550
+ "step": 1075
1551
+ },
1552
+ {
1553
+ "epoch": 4.595744680851064,
1554
+ "grad_norm": 0.4613640308380127,
1555
+ "learning_rate": 8.085106382978724e-05,
1556
+ "loss": 0.4089,
1557
+ "step": 1080
1558
+ },
1559
+ {
1560
+ "epoch": 4.617021276595745,
1561
+ "grad_norm": 0.5364215970039368,
1562
+ "learning_rate": 7.659574468085106e-05,
1563
+ "loss": 0.406,
1564
+ "step": 1085
1565
+ },
1566
+ {
1567
+ "epoch": 4.638297872340425,
1568
+ "grad_norm": 0.7170926928520203,
1569
+ "learning_rate": 7.234042553191488e-05,
1570
+ "loss": 0.3827,
1571
+ "step": 1090
1572
+ },
1573
+ {
1574
+ "epoch": 4.659574468085106,
1575
+ "grad_norm": 0.5427092909812927,
1576
+ "learning_rate": 6.808510638297873e-05,
1577
+ "loss": 0.4274,
1578
+ "step": 1095
1579
+ },
1580
+ {
1581
+ "epoch": 4.680851063829787,
1582
+ "grad_norm": 0.44160687923431396,
1583
+ "learning_rate": 6.382978723404255e-05,
1584
+ "loss": 0.4182,
1585
+ "step": 1100
1586
+ },
1587
+ {
1588
+ "epoch": 4.702127659574468,
1589
+ "grad_norm": 0.5841237902641296,
1590
+ "learning_rate": 5.957446808510639e-05,
1591
+ "loss": 0.4459,
1592
+ "step": 1105
1593
+ },
1594
+ {
1595
+ "epoch": 4.723404255319149,
1596
+ "grad_norm": 0.6145776510238647,
1597
+ "learning_rate": 5.531914893617021e-05,
1598
+ "loss": 0.4415,
1599
+ "step": 1110
1600
+ },
1601
+ {
1602
+ "epoch": 4.74468085106383,
1603
+ "grad_norm": 0.44807735085487366,
1604
+ "learning_rate": 5.1063829787234044e-05,
1605
+ "loss": 0.4248,
1606
+ "step": 1115
1607
+ },
1608
+ {
1609
+ "epoch": 4.76595744680851,
1610
+ "grad_norm": 0.7016127109527588,
1611
+ "learning_rate": 4.680851063829787e-05,
1612
+ "loss": 0.3613,
1613
+ "step": 1120
1614
+ },
1615
+ {
1616
+ "epoch": 4.787234042553192,
1617
+ "grad_norm": 0.5572742819786072,
1618
+ "learning_rate": 4.2553191489361704e-05,
1619
+ "loss": 0.4027,
1620
+ "step": 1125
1621
+ },
1622
+ {
1623
+ "epoch": 4.808510638297872,
1624
+ "grad_norm": 0.5368435978889465,
1625
+ "learning_rate": 3.829787234042553e-05,
1626
+ "loss": 0.4111,
1627
+ "step": 1130
1628
+ },
1629
+ {
1630
+ "epoch": 4.829787234042553,
1631
+ "grad_norm": 0.4862489700317383,
1632
+ "learning_rate": 3.4042553191489365e-05,
1633
+ "loss": 0.3663,
1634
+ "step": 1135
1635
+ },
1636
+ {
1637
+ "epoch": 4.851063829787234,
1638
+ "grad_norm": 0.6198825240135193,
1639
+ "learning_rate": 2.9787234042553195e-05,
1640
+ "loss": 0.3791,
1641
+ "step": 1140
1642
+ },
1643
+ {
1644
+ "epoch": 4.872340425531915,
1645
+ "grad_norm": 0.5688868165016174,
1646
+ "learning_rate": 2.5531914893617022e-05,
1647
+ "loss": 0.393,
1648
+ "step": 1145
1649
+ },
1650
+ {
1651
+ "epoch": 4.8936170212765955,
1652
+ "grad_norm": 0.44319066405296326,
1653
+ "learning_rate": 2.1276595744680852e-05,
1654
+ "loss": 0.3728,
1655
+ "step": 1150
1656
+ },
1657
+ {
1658
+ "epoch": 4.914893617021277,
1659
+ "grad_norm": 0.6268962621688843,
1660
+ "learning_rate": 1.7021276595744682e-05,
1661
+ "loss": 0.3832,
1662
+ "step": 1155
1663
+ },
1664
+ {
1665
+ "epoch": 4.9361702127659575,
1666
+ "grad_norm": 0.41478314995765686,
1667
+ "learning_rate": 1.2765957446808511e-05,
1668
+ "loss": 0.3647,
1669
+ "step": 1160
1670
+ },
1671
+ {
1672
+ "epoch": 4.957446808510638,
1673
+ "grad_norm": 0.8043156266212463,
1674
+ "learning_rate": 8.510638297872341e-06,
1675
+ "loss": 0.3705,
1676
+ "step": 1165
1677
+ },
1678
+ {
1679
+ "epoch": 4.9787234042553195,
1680
+ "grad_norm": 0.5503541827201843,
1681
+ "learning_rate": 4.255319148936171e-06,
1682
+ "loss": 0.3862,
1683
+ "step": 1170
1684
+ },
1685
+ {
1686
+ "epoch": 5.0,
1687
+ "grad_norm": 0.6788877844810486,
1688
+ "learning_rate": 0.0,
1689
+ "loss": 0.3694,
1690
+ "step": 1175
1691
+ },
1692
+ {
1693
+ "epoch": 5.0,
1694
+ "eval_accuracy": 0.7538666666666667,
1695
+ "eval_loss": 0.5220404863357544,
1696
+ "eval_runtime": 52.803,
1697
+ "eval_samples_per_second": 71.019,
1698
+ "eval_steps_per_second": 1.117,
1699
+ "step": 1175
1700
+ },
1701
+ {
1702
+ "epoch": 5.0,
1703
+ "step": 1175,
1704
+ "total_flos": 5.8118992210944e+18,
1705
+ "train_loss": 0.5645027552259729,
1706
+ "train_runtime": 2886.1261,
1707
+ "train_samples_per_second": 25.986,
1708
+ "train_steps_per_second": 0.407
1709
+ }
1710
+ ],
1711
+ "logging_steps": 5,
1712
+ "max_steps": 1175,
1713
+ "num_input_tokens_seen": 0,
1714
+ "num_train_epochs": 5,
1715
+ "save_steps": 500,
1716
+ "stateful_callbacks": {
1717
+ "TrainerControl": {
1718
+ "args": {
1719
+ "should_epoch_stop": false,
1720
+ "should_evaluate": false,
1721
+ "should_log": false,
1722
+ "should_save": true,
1723
+ "should_training_stop": true
1724
+ },
1725
+ "attributes": {}
1726
+ }
1727
+ },
1728
+ "total_flos": 5.8118992210944e+18,
1729
+ "train_batch_size": 64,
1730
+ "trial_name": null,
1731
+ "trial_params": null
1732
+ }