pszemraj commited on
Commit
adf1478
1 Parent(s): 89c63f1

End of training

Browse files
Files changed (5) hide show
  1. README.md +1 -1
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. train_results.json +9 -0
  5. trainer_state.json +1222 -0
README.md CHANGED
@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # griffin-1024-llama3t-8layer-simple_wikipedia_LM-vN
16
 
17
- This model is a fine-tuned version of [griffin-1024-llama3t-8layer](https://huggingface.co/griffin-1024-llama3t-8layer) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
  - Loss: 4.3584
20
  - Accuracy: 0.3789
 
14
 
15
  # griffin-1024-llama3t-8layer-simple_wikipedia_LM-vN
16
 
17
+ This model is a fine-tuned version of [griffin-1024-llama3t-8layer](https://huggingface.co/griffin-1024-llama3t-8layer) on the pszemraj/simple_wikipedia_LM dataset.
18
  It achieves the following results on the evaluation set:
19
  - Loss: 4.3584
20
  - Accuracy: 0.3789
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.995634549423137,
3
+ "eval_accuracy": 0.3789325513196481,
4
+ "eval_loss": 4.358436584472656,
5
+ "eval_runtime": 20.603,
6
+ "eval_samples": 250,
7
+ "eval_samples_per_second": 12.134,
8
+ "eval_steps_per_second": 3.058,
9
+ "perplexity": 78.13488159488827,
10
+ "total_flos": 6.441101073108173e+16,
11
+ "train_loss": 8.280340445041656,
12
+ "train_runtime": 18888.2342,
13
+ "train_samples": 51310,
14
+ "train_samples_per_second": 5.433,
15
+ "train_steps_per_second": 0.042
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.995634549423137,
3
+ "eval_accuracy": 0.3789325513196481,
4
+ "eval_loss": 4.358436584472656,
5
+ "eval_runtime": 20.603,
6
+ "eval_samples": 250,
7
+ "eval_samples_per_second": 12.134,
8
+ "eval_steps_per_second": 3.058,
9
+ "perplexity": 78.13488159488827
10
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.995634549423137,
3
+ "total_flos": 6.441101073108173e+16,
4
+ "train_loss": 8.280340445041656,
5
+ "train_runtime": 18888.2342,
6
+ "train_samples": 51310,
7
+ "train_samples_per_second": 5.433,
8
+ "train_steps_per_second": 0.042
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.995634549423137,
5
+ "eval_steps": 100,
6
+ "global_step": 800,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.012472715933894606,
13
+ "grad_norm": 6.119478225708008,
14
+ "learning_rate": 3.75e-05,
15
+ "loss": 36.8721,
16
+ "step": 5
17
+ },
18
+ {
19
+ "epoch": 0.024945431867789213,
20
+ "grad_norm": 2.9732778072357178,
21
+ "learning_rate": 7.5e-05,
22
+ "loss": 33.0439,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 0.037418147801683815,
27
+ "grad_norm": 1.5332609415054321,
28
+ "learning_rate": 0.0001125,
29
+ "loss": 30.1165,
30
+ "step": 15
31
+ },
32
+ {
33
+ "epoch": 0.049890863735578425,
34
+ "grad_norm": 1.2270578145980835,
35
+ "learning_rate": 0.00015,
36
+ "loss": 28.647,
37
+ "step": 20
38
+ },
39
+ {
40
+ "epoch": 0.06236357966947303,
41
+ "grad_norm": 1.053142786026001,
42
+ "learning_rate": 0.00018749999999999998,
43
+ "loss": 26.0629,
44
+ "step": 25
45
+ },
46
+ {
47
+ "epoch": 0.07483629560336763,
48
+ "grad_norm": 1.0131248235702515,
49
+ "learning_rate": 0.000225,
50
+ "loss": 23.8703,
51
+ "step": 30
52
+ },
53
+ {
54
+ "epoch": 0.08730901153726224,
55
+ "grad_norm": 0.9197985529899597,
56
+ "learning_rate": 0.0002625,
57
+ "loss": 21.521,
58
+ "step": 35
59
+ },
60
+ {
61
+ "epoch": 0.09978172747115685,
62
+ "grad_norm": 1.0926002264022827,
63
+ "learning_rate": 0.0003,
64
+ "loss": 19.8433,
65
+ "step": 40
66
+ },
67
+ {
68
+ "epoch": 0.11225444340505145,
69
+ "grad_norm": 0.7152827382087708,
70
+ "learning_rate": 0.0003,
71
+ "loss": 18.618,
72
+ "step": 45
73
+ },
74
+ {
75
+ "epoch": 0.12472715933894606,
76
+ "grad_norm": 0.6178381443023682,
77
+ "learning_rate": 0.0003,
78
+ "loss": 17.3644,
79
+ "step": 50
80
+ },
81
+ {
82
+ "epoch": 0.13719987527284067,
83
+ "grad_norm": 0.48063215613365173,
84
+ "learning_rate": 0.0003,
85
+ "loss": 16.6105,
86
+ "step": 55
87
+ },
88
+ {
89
+ "epoch": 0.14967259120673526,
90
+ "grad_norm": 0.46090102195739746,
91
+ "learning_rate": 0.0003,
92
+ "loss": 16.2326,
93
+ "step": 60
94
+ },
95
+ {
96
+ "epoch": 0.16214530714062989,
97
+ "grad_norm": 0.4266461730003357,
98
+ "learning_rate": 0.0003,
99
+ "loss": 15.8385,
100
+ "step": 65
101
+ },
102
+ {
103
+ "epoch": 0.17461802307452448,
104
+ "grad_norm": 0.3876805901527405,
105
+ "learning_rate": 0.0003,
106
+ "loss": 15.3119,
107
+ "step": 70
108
+ },
109
+ {
110
+ "epoch": 0.18709073900841908,
111
+ "grad_norm": 0.3796117603778839,
112
+ "learning_rate": 0.0003,
113
+ "loss": 15.2481,
114
+ "step": 75
115
+ },
116
+ {
117
+ "epoch": 0.1995634549423137,
118
+ "grad_norm": 0.37646082043647766,
119
+ "learning_rate": 0.0003,
120
+ "loss": 14.7319,
121
+ "step": 80
122
+ },
123
+ {
124
+ "epoch": 0.2120361708762083,
125
+ "grad_norm": 0.3688748776912689,
126
+ "learning_rate": 0.0003,
127
+ "loss": 14.6364,
128
+ "step": 85
129
+ },
130
+ {
131
+ "epoch": 0.2245088868101029,
132
+ "grad_norm": 0.37435677647590637,
133
+ "learning_rate": 0.0003,
134
+ "loss": 14.2134,
135
+ "step": 90
136
+ },
137
+ {
138
+ "epoch": 0.23698160274399752,
139
+ "grad_norm": 0.36440223455429077,
140
+ "learning_rate": 0.0003,
141
+ "loss": 13.9198,
142
+ "step": 95
143
+ },
144
+ {
145
+ "epoch": 0.2494543186778921,
146
+ "grad_norm": 0.33530500531196594,
147
+ "learning_rate": 0.0003,
148
+ "loss": 13.6044,
149
+ "step": 100
150
+ },
151
+ {
152
+ "epoch": 0.2494543186778921,
153
+ "eval_accuracy": 0.007913978494623657,
154
+ "eval_loss": 12.544113159179688,
155
+ "eval_runtime": 18.5829,
156
+ "eval_samples_per_second": 13.453,
157
+ "eval_steps_per_second": 3.39,
158
+ "step": 100
159
+ },
160
+ {
161
+ "epoch": 0.26192703461178674,
162
+ "grad_norm": 0.3251523971557617,
163
+ "learning_rate": 0.0003,
164
+ "loss": 13.3181,
165
+ "step": 105
166
+ },
167
+ {
168
+ "epoch": 0.27439975054568133,
169
+ "grad_norm": 0.3473041355609894,
170
+ "learning_rate": 0.0003,
171
+ "loss": 12.9976,
172
+ "step": 110
173
+ },
174
+ {
175
+ "epoch": 0.2868724664795759,
176
+ "grad_norm": 0.3266255557537079,
177
+ "learning_rate": 0.0003,
178
+ "loss": 12.7667,
179
+ "step": 115
180
+ },
181
+ {
182
+ "epoch": 0.2993451824134705,
183
+ "grad_norm": 0.35194671154022217,
184
+ "learning_rate": 0.0003,
185
+ "loss": 12.7544,
186
+ "step": 120
187
+ },
188
+ {
189
+ "epoch": 0.3118178983473651,
190
+ "grad_norm": 0.34635770320892334,
191
+ "learning_rate": 0.0003,
192
+ "loss": 12.2756,
193
+ "step": 125
194
+ },
195
+ {
196
+ "epoch": 0.32429061428125977,
197
+ "grad_norm": 0.3480404019355774,
198
+ "learning_rate": 0.0003,
199
+ "loss": 12.1192,
200
+ "step": 130
201
+ },
202
+ {
203
+ "epoch": 0.33676333021515437,
204
+ "grad_norm": 0.3309994339942932,
205
+ "learning_rate": 0.0003,
206
+ "loss": 11.8339,
207
+ "step": 135
208
+ },
209
+ {
210
+ "epoch": 0.34923604614904896,
211
+ "grad_norm": 0.33558282256126404,
212
+ "learning_rate": 0.0003,
213
+ "loss": 11.6745,
214
+ "step": 140
215
+ },
216
+ {
217
+ "epoch": 0.36170876208294356,
218
+ "grad_norm": 0.3359847664833069,
219
+ "learning_rate": 0.0003,
220
+ "loss": 11.3363,
221
+ "step": 145
222
+ },
223
+ {
224
+ "epoch": 0.37418147801683815,
225
+ "grad_norm": 0.33947232365608215,
226
+ "learning_rate": 0.0003,
227
+ "loss": 11.0303,
228
+ "step": 150
229
+ },
230
+ {
231
+ "epoch": 0.38665419395073275,
232
+ "grad_norm": 0.32984089851379395,
233
+ "learning_rate": 0.0003,
234
+ "loss": 10.9271,
235
+ "step": 155
236
+ },
237
+ {
238
+ "epoch": 0.3991269098846274,
239
+ "grad_norm": 0.3498048782348633,
240
+ "learning_rate": 0.0003,
241
+ "loss": 10.6215,
242
+ "step": 160
243
+ },
244
+ {
245
+ "epoch": 0.411599625818522,
246
+ "grad_norm": 0.354889839887619,
247
+ "learning_rate": 0.0003,
248
+ "loss": 10.5165,
249
+ "step": 165
250
+ },
251
+ {
252
+ "epoch": 0.4240723417524166,
253
+ "grad_norm": 0.34426406025886536,
254
+ "learning_rate": 0.0003,
255
+ "loss": 10.0716,
256
+ "step": 170
257
+ },
258
+ {
259
+ "epoch": 0.4365450576863112,
260
+ "grad_norm": 0.34653356671333313,
261
+ "learning_rate": 0.0003,
262
+ "loss": 10.0709,
263
+ "step": 175
264
+ },
265
+ {
266
+ "epoch": 0.4490177736202058,
267
+ "grad_norm": 0.3454643189907074,
268
+ "learning_rate": 0.0003,
269
+ "loss": 9.7226,
270
+ "step": 180
271
+ },
272
+ {
273
+ "epoch": 0.4614904895541004,
274
+ "grad_norm": 0.3724479377269745,
275
+ "learning_rate": 0.0003,
276
+ "loss": 9.5827,
277
+ "step": 185
278
+ },
279
+ {
280
+ "epoch": 0.47396320548799503,
281
+ "grad_norm": 0.37687671184539795,
282
+ "learning_rate": 0.0003,
283
+ "loss": 9.3702,
284
+ "step": 190
285
+ },
286
+ {
287
+ "epoch": 0.4864359214218896,
288
+ "grad_norm": 0.3670942187309265,
289
+ "learning_rate": 0.0003,
290
+ "loss": 9.2377,
291
+ "step": 195
292
+ },
293
+ {
294
+ "epoch": 0.4989086373557842,
295
+ "grad_norm": 0.3864516019821167,
296
+ "learning_rate": 0.0003,
297
+ "loss": 8.9524,
298
+ "step": 200
299
+ },
300
+ {
301
+ "epoch": 0.4989086373557842,
302
+ "eval_accuracy": 0.04734701857282502,
303
+ "eval_loss": 8.425415992736816,
304
+ "eval_runtime": 17.9427,
305
+ "eval_samples_per_second": 13.933,
306
+ "eval_steps_per_second": 3.511,
307
+ "step": 200
308
+ },
309
+ {
310
+ "epoch": 0.5113813532896788,
311
+ "grad_norm": 0.3540992736816406,
312
+ "learning_rate": 0.0003,
313
+ "loss": 8.9811,
314
+ "step": 205
315
+ },
316
+ {
317
+ "epoch": 0.5238540692235735,
318
+ "grad_norm": 0.35756129026412964,
319
+ "learning_rate": 0.0003,
320
+ "loss": 8.6522,
321
+ "step": 210
322
+ },
323
+ {
324
+ "epoch": 0.536326785157468,
325
+ "grad_norm": 0.38473081588745117,
326
+ "learning_rate": 0.0003,
327
+ "loss": 8.6516,
328
+ "step": 215
329
+ },
330
+ {
331
+ "epoch": 0.5487995010913627,
332
+ "grad_norm": 0.3616325259208679,
333
+ "learning_rate": 0.0003,
334
+ "loss": 8.5213,
335
+ "step": 220
336
+ },
337
+ {
338
+ "epoch": 0.5612722170252572,
339
+ "grad_norm": 0.375959187746048,
340
+ "learning_rate": 0.0003,
341
+ "loss": 8.3109,
342
+ "step": 225
343
+ },
344
+ {
345
+ "epoch": 0.5737449329591519,
346
+ "grad_norm": 0.38421833515167236,
347
+ "learning_rate": 0.0003,
348
+ "loss": 8.2747,
349
+ "step": 230
350
+ },
351
+ {
352
+ "epoch": 0.5862176488930465,
353
+ "grad_norm": 0.379168301820755,
354
+ "learning_rate": 0.0003,
355
+ "loss": 8.197,
356
+ "step": 235
357
+ },
358
+ {
359
+ "epoch": 0.598690364826941,
360
+ "grad_norm": 0.39803043007850647,
361
+ "learning_rate": 0.0003,
362
+ "loss": 8.0836,
363
+ "step": 240
364
+ },
365
+ {
366
+ "epoch": 0.6111630807608357,
367
+ "grad_norm": 0.41287195682525635,
368
+ "learning_rate": 0.0003,
369
+ "loss": 7.9406,
370
+ "step": 245
371
+ },
372
+ {
373
+ "epoch": 0.6236357966947302,
374
+ "grad_norm": 0.3857806324958801,
375
+ "learning_rate": 0.0003,
376
+ "loss": 7.9488,
377
+ "step": 250
378
+ },
379
+ {
380
+ "epoch": 0.6361085126286249,
381
+ "grad_norm": 0.3808286488056183,
382
+ "learning_rate": 0.0003,
383
+ "loss": 7.7673,
384
+ "step": 255
385
+ },
386
+ {
387
+ "epoch": 0.6485812285625195,
388
+ "grad_norm": 0.4393250048160553,
389
+ "learning_rate": 0.0003,
390
+ "loss": 7.707,
391
+ "step": 260
392
+ },
393
+ {
394
+ "epoch": 0.6610539444964141,
395
+ "grad_norm": 0.4232034981250763,
396
+ "learning_rate": 0.0003,
397
+ "loss": 7.7852,
398
+ "step": 265
399
+ },
400
+ {
401
+ "epoch": 0.6735266604303087,
402
+ "grad_norm": 0.42222586274147034,
403
+ "learning_rate": 0.0003,
404
+ "loss": 7.6145,
405
+ "step": 270
406
+ },
407
+ {
408
+ "epoch": 0.6859993763642033,
409
+ "grad_norm": 0.35792261362075806,
410
+ "learning_rate": 0.0003,
411
+ "loss": 7.5498,
412
+ "step": 275
413
+ },
414
+ {
415
+ "epoch": 0.6984720922980979,
416
+ "grad_norm": 0.343427449464798,
417
+ "learning_rate": 0.0003,
418
+ "loss": 7.4698,
419
+ "step": 280
420
+ },
421
+ {
422
+ "epoch": 0.7109448082319925,
423
+ "grad_norm": 0.4176105856895447,
424
+ "learning_rate": 0.0003,
425
+ "loss": 7.3752,
426
+ "step": 285
427
+ },
428
+ {
429
+ "epoch": 0.7234175241658871,
430
+ "grad_norm": 0.40987178683280945,
431
+ "learning_rate": 0.0003,
432
+ "loss": 7.342,
433
+ "step": 290
434
+ },
435
+ {
436
+ "epoch": 0.7358902400997818,
437
+ "grad_norm": 0.4014261066913605,
438
+ "learning_rate": 0.0003,
439
+ "loss": 7.1609,
440
+ "step": 295
441
+ },
442
+ {
443
+ "epoch": 0.7483629560336763,
444
+ "grad_norm": 0.4236806035041809,
445
+ "learning_rate": 0.0003,
446
+ "loss": 7.1721,
447
+ "step": 300
448
+ },
449
+ {
450
+ "epoch": 0.7483629560336763,
451
+ "eval_accuracy": 0.03885043988269795,
452
+ "eval_loss": 6.619859218597412,
453
+ "eval_runtime": 18.2015,
454
+ "eval_samples_per_second": 13.735,
455
+ "eval_steps_per_second": 3.461,
456
+ "step": 300
457
+ },
458
+ {
459
+ "epoch": 0.760835671967571,
460
+ "grad_norm": 0.4133549630641937,
461
+ "learning_rate": 0.0003,
462
+ "loss": 7.1892,
463
+ "step": 305
464
+ },
465
+ {
466
+ "epoch": 0.7733083879014655,
467
+ "grad_norm": 0.44653546810150146,
468
+ "learning_rate": 0.0003,
469
+ "loss": 7.0446,
470
+ "step": 310
471
+ },
472
+ {
473
+ "epoch": 0.7857811038353602,
474
+ "grad_norm": 0.41286739706993103,
475
+ "learning_rate": 0.0003,
476
+ "loss": 6.9656,
477
+ "step": 315
478
+ },
479
+ {
480
+ "epoch": 0.7982538197692548,
481
+ "grad_norm": 0.3720580041408539,
482
+ "learning_rate": 0.0003,
483
+ "loss": 6.907,
484
+ "step": 320
485
+ },
486
+ {
487
+ "epoch": 0.8107265357031493,
488
+ "grad_norm": 0.39917078614234924,
489
+ "learning_rate": 0.0003,
490
+ "loss": 6.9853,
491
+ "step": 325
492
+ },
493
+ {
494
+ "epoch": 0.823199251637044,
495
+ "grad_norm": 0.4373719096183777,
496
+ "learning_rate": 0.0003,
497
+ "loss": 6.8592,
498
+ "step": 330
499
+ },
500
+ {
501
+ "epoch": 0.8356719675709385,
502
+ "grad_norm": 0.4183291792869568,
503
+ "learning_rate": 0.0003,
504
+ "loss": 6.7432,
505
+ "step": 335
506
+ },
507
+ {
508
+ "epoch": 0.8481446835048332,
509
+ "grad_norm": 0.40696659684181213,
510
+ "learning_rate": 0.0003,
511
+ "loss": 6.7505,
512
+ "step": 340
513
+ },
514
+ {
515
+ "epoch": 0.8606173994387278,
516
+ "grad_norm": 0.36887314915657043,
517
+ "learning_rate": 0.0003,
518
+ "loss": 6.7657,
519
+ "step": 345
520
+ },
521
+ {
522
+ "epoch": 0.8730901153726224,
523
+ "grad_norm": 0.4768717885017395,
524
+ "learning_rate": 0.0003,
525
+ "loss": 6.7173,
526
+ "step": 350
527
+ },
528
+ {
529
+ "epoch": 0.885562831306517,
530
+ "grad_norm": 0.43819448351860046,
531
+ "learning_rate": 0.0003,
532
+ "loss": 6.5465,
533
+ "step": 355
534
+ },
535
+ {
536
+ "epoch": 0.8980355472404116,
537
+ "grad_norm": 0.40145763754844666,
538
+ "learning_rate": 0.0003,
539
+ "loss": 6.512,
540
+ "step": 360
541
+ },
542
+ {
543
+ "epoch": 0.9105082631743062,
544
+ "grad_norm": 0.49852269887924194,
545
+ "learning_rate": 0.0003,
546
+ "loss": 6.5335,
547
+ "step": 365
548
+ },
549
+ {
550
+ "epoch": 0.9229809791082008,
551
+ "grad_norm": 0.454698771238327,
552
+ "learning_rate": 0.0003,
553
+ "loss": 6.4527,
554
+ "step": 370
555
+ },
556
+ {
557
+ "epoch": 0.9354536950420954,
558
+ "grad_norm": 0.4860341250896454,
559
+ "learning_rate": 0.0003,
560
+ "loss": 6.4102,
561
+ "step": 375
562
+ },
563
+ {
564
+ "epoch": 0.9479264109759901,
565
+ "grad_norm": 0.39718613028526306,
566
+ "learning_rate": 0.0003,
567
+ "loss": 6.4694,
568
+ "step": 380
569
+ },
570
+ {
571
+ "epoch": 0.9603991269098846,
572
+ "grad_norm": 0.4210009276866913,
573
+ "learning_rate": 0.0003,
574
+ "loss": 6.4807,
575
+ "step": 385
576
+ },
577
+ {
578
+ "epoch": 0.9728718428437793,
579
+ "grad_norm": 0.4482674300670624,
580
+ "learning_rate": 0.0003,
581
+ "loss": 6.414,
582
+ "step": 390
583
+ },
584
+ {
585
+ "epoch": 0.9853445587776738,
586
+ "grad_norm": 0.42889419198036194,
587
+ "learning_rate": 0.0003,
588
+ "loss": 6.3543,
589
+ "step": 395
590
+ },
591
+ {
592
+ "epoch": 0.9978172747115684,
593
+ "grad_norm": 0.5144391059875488,
594
+ "learning_rate": 0.0003,
595
+ "loss": 6.2087,
596
+ "step": 400
597
+ },
598
+ {
599
+ "epoch": 0.9978172747115684,
600
+ "eval_accuracy": 0.22513000977517106,
601
+ "eval_loss": 5.719752311706543,
602
+ "eval_runtime": 17.8865,
603
+ "eval_samples_per_second": 13.977,
604
+ "eval_steps_per_second": 3.522,
605
+ "step": 400
606
+ },
607
+ {
608
+ "epoch": 1.010289990645463,
609
+ "grad_norm": 0.6417849063873291,
610
+ "learning_rate": 0.0003,
611
+ "loss": 6.048,
612
+ "step": 405
613
+ },
614
+ {
615
+ "epoch": 1.0227627065793576,
616
+ "grad_norm": 0.5739749073982239,
617
+ "learning_rate": 0.0003,
618
+ "loss": 5.9866,
619
+ "step": 410
620
+ },
621
+ {
622
+ "epoch": 1.0352354225132523,
623
+ "grad_norm": 0.49603304266929626,
624
+ "learning_rate": 0.0003,
625
+ "loss": 5.9419,
626
+ "step": 415
627
+ },
628
+ {
629
+ "epoch": 1.047708138447147,
630
+ "grad_norm": 0.5403385162353516,
631
+ "learning_rate": 0.0003,
632
+ "loss": 5.8366,
633
+ "step": 420
634
+ },
635
+ {
636
+ "epoch": 1.0601808543810414,
637
+ "grad_norm": 0.6306777000427246,
638
+ "learning_rate": 0.0003,
639
+ "loss": 5.7657,
640
+ "step": 425
641
+ },
642
+ {
643
+ "epoch": 1.072653570314936,
644
+ "grad_norm": 0.7016925811767578,
645
+ "learning_rate": 0.0003,
646
+ "loss": 5.6619,
647
+ "step": 430
648
+ },
649
+ {
650
+ "epoch": 1.0851262862488307,
651
+ "grad_norm": 0.6606624722480774,
652
+ "learning_rate": 0.0003,
653
+ "loss": 5.6094,
654
+ "step": 435
655
+ },
656
+ {
657
+ "epoch": 1.0975990021827253,
658
+ "grad_norm": 0.7023086547851562,
659
+ "learning_rate": 0.0003,
660
+ "loss": 5.6074,
661
+ "step": 440
662
+ },
663
+ {
664
+ "epoch": 1.11007171811662,
665
+ "grad_norm": 0.8505487442016602,
666
+ "learning_rate": 0.0003,
667
+ "loss": 5.6959,
668
+ "step": 445
669
+ },
670
+ {
671
+ "epoch": 1.1225444340505144,
672
+ "grad_norm": 0.6713190674781799,
673
+ "learning_rate": 0.0003,
674
+ "loss": 5.6344,
675
+ "step": 450
676
+ },
677
+ {
678
+ "epoch": 1.135017149984409,
679
+ "grad_norm": 0.5908814668655396,
680
+ "learning_rate": 0.0003,
681
+ "loss": 5.4591,
682
+ "step": 455
683
+ },
684
+ {
685
+ "epoch": 1.1474898659183037,
686
+ "grad_norm": 0.7601476311683655,
687
+ "learning_rate": 0.0003,
688
+ "loss": 5.5622,
689
+ "step": 460
690
+ },
691
+ {
692
+ "epoch": 1.1599625818521984,
693
+ "grad_norm": 0.5737589001655579,
694
+ "learning_rate": 0.0003,
695
+ "loss": 5.4541,
696
+ "step": 465
697
+ },
698
+ {
699
+ "epoch": 1.172435297786093,
700
+ "grad_norm": 0.8831024169921875,
701
+ "learning_rate": 0.0003,
702
+ "loss": 5.4784,
703
+ "step": 470
704
+ },
705
+ {
706
+ "epoch": 1.1849080137199874,
707
+ "grad_norm": 0.8297187089920044,
708
+ "learning_rate": 0.0003,
709
+ "loss": 5.4252,
710
+ "step": 475
711
+ },
712
+ {
713
+ "epoch": 1.197380729653882,
714
+ "grad_norm": 0.857667863368988,
715
+ "learning_rate": 0.0003,
716
+ "loss": 5.3268,
717
+ "step": 480
718
+ },
719
+ {
720
+ "epoch": 1.2098534455877767,
721
+ "grad_norm": 0.8937066793441772,
722
+ "learning_rate": 0.0003,
723
+ "loss": 5.279,
724
+ "step": 485
725
+ },
726
+ {
727
+ "epoch": 1.2223261615216714,
728
+ "grad_norm": 0.784275472164154,
729
+ "learning_rate": 0.0003,
730
+ "loss": 5.3079,
731
+ "step": 490
732
+ },
733
+ {
734
+ "epoch": 1.234798877455566,
735
+ "grad_norm": 0.7549949884414673,
736
+ "learning_rate": 0.0003,
737
+ "loss": 5.3977,
738
+ "step": 495
739
+ },
740
+ {
741
+ "epoch": 1.2472715933894605,
742
+ "grad_norm": 0.7452312111854553,
743
+ "learning_rate": 0.0003,
744
+ "loss": 5.4917,
745
+ "step": 500
746
+ },
747
+ {
748
+ "epoch": 1.2472715933894605,
749
+ "eval_accuracy": 0.32684261974584555,
750
+ "eval_loss": 4.947990894317627,
751
+ "eval_runtime": 19.5683,
752
+ "eval_samples_per_second": 12.776,
753
+ "eval_steps_per_second": 3.219,
754
+ "step": 500
755
+ },
756
+ {
757
+ "epoch": 1.2597443093233551,
758
+ "grad_norm": 0.6744974255561829,
759
+ "learning_rate": 0.0003,
760
+ "loss": 5.1679,
761
+ "step": 505
762
+ },
763
+ {
764
+ "epoch": 1.2722170252572498,
765
+ "grad_norm": 1.0095832347869873,
766
+ "learning_rate": 0.0003,
767
+ "loss": 5.3918,
768
+ "step": 510
769
+ },
770
+ {
771
+ "epoch": 1.2846897411911444,
772
+ "grad_norm": 0.7461665272712708,
773
+ "learning_rate": 0.0003,
774
+ "loss": 5.2346,
775
+ "step": 515
776
+ },
777
+ {
778
+ "epoch": 1.2971624571250389,
779
+ "grad_norm": 0.88801109790802,
780
+ "learning_rate": 0.0003,
781
+ "loss": 5.2033,
782
+ "step": 520
783
+ },
784
+ {
785
+ "epoch": 1.3096351730589335,
786
+ "grad_norm": 0.7549375891685486,
787
+ "learning_rate": 0.0003,
788
+ "loss": 5.098,
789
+ "step": 525
790
+ },
791
+ {
792
+ "epoch": 1.3221078889928282,
793
+ "grad_norm": 1.1236454248428345,
794
+ "learning_rate": 0.0003,
795
+ "loss": 5.2069,
796
+ "step": 530
797
+ },
798
+ {
799
+ "epoch": 1.3345806049267228,
800
+ "grad_norm": 0.9261302947998047,
801
+ "learning_rate": 0.0003,
802
+ "loss": 5.1925,
803
+ "step": 535
804
+ },
805
+ {
806
+ "epoch": 1.3470533208606175,
807
+ "grad_norm": 0.7248057126998901,
808
+ "learning_rate": 0.0003,
809
+ "loss": 5.109,
810
+ "step": 540
811
+ },
812
+ {
813
+ "epoch": 1.3595260367945121,
814
+ "grad_norm": 0.941017210483551,
815
+ "learning_rate": 0.0003,
816
+ "loss": 5.0975,
817
+ "step": 545
818
+ },
819
+ {
820
+ "epoch": 1.3719987527284065,
821
+ "grad_norm": 0.9451349973678589,
822
+ "learning_rate": 0.0003,
823
+ "loss": 5.1825,
824
+ "step": 550
825
+ },
826
+ {
827
+ "epoch": 1.3844714686623012,
828
+ "grad_norm": 0.9956802725791931,
829
+ "learning_rate": 0.0003,
830
+ "loss": 5.1017,
831
+ "step": 555
832
+ },
833
+ {
834
+ "epoch": 1.3969441845961958,
835
+ "grad_norm": 1.0484583377838135,
836
+ "learning_rate": 0.0003,
837
+ "loss": 5.1371,
838
+ "step": 560
839
+ },
840
+ {
841
+ "epoch": 1.4094169005300905,
842
+ "grad_norm": 1.1080021858215332,
843
+ "learning_rate": 0.0003,
844
+ "loss": 5.0146,
845
+ "step": 565
846
+ },
847
+ {
848
+ "epoch": 1.421889616463985,
849
+ "grad_norm": 0.9495016932487488,
850
+ "learning_rate": 0.0003,
851
+ "loss": 5.0971,
852
+ "step": 570
853
+ },
854
+ {
855
+ "epoch": 1.4343623323978796,
856
+ "grad_norm": 0.7586097717285156,
857
+ "learning_rate": 0.0003,
858
+ "loss": 5.0336,
859
+ "step": 575
860
+ },
861
+ {
862
+ "epoch": 1.4468350483317742,
863
+ "grad_norm": 0.647396981716156,
864
+ "learning_rate": 0.0003,
865
+ "loss": 5.0119,
866
+ "step": 580
867
+ },
868
+ {
869
+ "epoch": 1.4593077642656689,
870
+ "grad_norm": 0.7189023494720459,
871
+ "learning_rate": 0.0003,
872
+ "loss": 5.0908,
873
+ "step": 585
874
+ },
875
+ {
876
+ "epoch": 1.4717804801995635,
877
+ "grad_norm": 0.9973328113555908,
878
+ "learning_rate": 0.0003,
879
+ "loss": 4.7903,
880
+ "step": 590
881
+ },
882
+ {
883
+ "epoch": 1.4842531961334582,
884
+ "grad_norm": 0.8094688057899475,
885
+ "learning_rate": 0.0003,
886
+ "loss": 5.0103,
887
+ "step": 595
888
+ },
889
+ {
890
+ "epoch": 1.4967259120673526,
891
+ "grad_norm": 1.0308438539505005,
892
+ "learning_rate": 0.0003,
893
+ "loss": 4.9408,
894
+ "step": 600
895
+ },
896
+ {
897
+ "epoch": 1.4967259120673526,
898
+ "eval_accuracy": 0.35667253176930597,
899
+ "eval_loss": 4.673036575317383,
900
+ "eval_runtime": 19.5514,
901
+ "eval_samples_per_second": 12.787,
902
+ "eval_steps_per_second": 3.222,
903
+ "step": 600
904
+ },
905
+ {
906
+ "epoch": 1.5091986280012473,
907
+ "grad_norm": 0.7587366104125977,
908
+ "learning_rate": 0.0003,
909
+ "loss": 4.9818,
910
+ "step": 605
911
+ },
912
+ {
913
+ "epoch": 1.521671343935142,
914
+ "grad_norm": 1.0271868705749512,
915
+ "learning_rate": 0.0003,
916
+ "loss": 4.9614,
917
+ "step": 610
918
+ },
919
+ {
920
+ "epoch": 1.5341440598690363,
921
+ "grad_norm": 1.061369776725769,
922
+ "learning_rate": 0.0003,
923
+ "loss": 4.8608,
924
+ "step": 615
925
+ },
926
+ {
927
+ "epoch": 1.546616775802931,
928
+ "grad_norm": 0.9442321062088013,
929
+ "learning_rate": 0.0003,
930
+ "loss": 4.9478,
931
+ "step": 620
932
+ },
933
+ {
934
+ "epoch": 1.5590894917368257,
935
+ "grad_norm": 0.8110609650611877,
936
+ "learning_rate": 0.0003,
937
+ "loss": 5.0979,
938
+ "step": 625
939
+ },
940
+ {
941
+ "epoch": 1.5715622076707203,
942
+ "grad_norm": 0.6862745881080627,
943
+ "learning_rate": 0.0003,
944
+ "loss": 4.8345,
945
+ "step": 630
946
+ },
947
+ {
948
+ "epoch": 1.584034923604615,
949
+ "grad_norm": 0.8737391233444214,
950
+ "learning_rate": 0.0003,
951
+ "loss": 4.8572,
952
+ "step": 635
953
+ },
954
+ {
955
+ "epoch": 1.5965076395385096,
956
+ "grad_norm": 0.8002131581306458,
957
+ "learning_rate": 0.0003,
958
+ "loss": 4.8072,
959
+ "step": 640
960
+ },
961
+ {
962
+ "epoch": 1.6089803554724043,
963
+ "grad_norm": 0.7860103845596313,
964
+ "learning_rate": 0.0003,
965
+ "loss": 4.8922,
966
+ "step": 645
967
+ },
968
+ {
969
+ "epoch": 1.6214530714062987,
970
+ "grad_norm": 0.9875708222389221,
971
+ "learning_rate": 0.0003,
972
+ "loss": 4.9247,
973
+ "step": 650
974
+ },
975
+ {
976
+ "epoch": 1.6339257873401933,
977
+ "grad_norm": 0.8873936533927917,
978
+ "learning_rate": 0.0003,
979
+ "loss": 4.8795,
980
+ "step": 655
981
+ },
982
+ {
983
+ "epoch": 1.646398503274088,
984
+ "grad_norm": 0.7963967323303223,
985
+ "learning_rate": 0.0003,
986
+ "loss": 4.835,
987
+ "step": 660
988
+ },
989
+ {
990
+ "epoch": 1.6588712192079824,
991
+ "grad_norm": 0.8068607449531555,
992
+ "learning_rate": 0.0003,
993
+ "loss": 4.8713,
994
+ "step": 665
995
+ },
996
+ {
997
+ "epoch": 1.671343935141877,
998
+ "grad_norm": 0.9093911647796631,
999
+ "learning_rate": 0.0003,
1000
+ "loss": 4.7725,
1001
+ "step": 670
1002
+ },
1003
+ {
1004
+ "epoch": 1.6838166510757717,
1005
+ "grad_norm": 0.7699265480041504,
1006
+ "learning_rate": 0.0003,
1007
+ "loss": 4.7502,
1008
+ "step": 675
1009
+ },
1010
+ {
1011
+ "epoch": 1.6962893670096664,
1012
+ "grad_norm": 0.7545697689056396,
1013
+ "learning_rate": 0.0003,
1014
+ "loss": 4.9555,
1015
+ "step": 680
1016
+ },
1017
+ {
1018
+ "epoch": 1.708762082943561,
1019
+ "grad_norm": 0.7571801543235779,
1020
+ "learning_rate": 0.0003,
1021
+ "loss": 4.7616,
1022
+ "step": 685
1023
+ },
1024
+ {
1025
+ "epoch": 1.7212347988774557,
1026
+ "grad_norm": 0.7757474184036255,
1027
+ "learning_rate": 0.0003,
1028
+ "loss": 4.6462,
1029
+ "step": 690
1030
+ },
1031
+ {
1032
+ "epoch": 1.7337075148113503,
1033
+ "grad_norm": 0.7473092079162598,
1034
+ "learning_rate": 0.0003,
1035
+ "loss": 4.6699,
1036
+ "step": 695
1037
+ },
1038
+ {
1039
+ "epoch": 1.7461802307452448,
1040
+ "grad_norm": 1.2531319856643677,
1041
+ "learning_rate": 0.0003,
1042
+ "loss": 4.8347,
1043
+ "step": 700
1044
+ },
1045
+ {
1046
+ "epoch": 1.7461802307452448,
1047
+ "eval_accuracy": 0.37069794721407623,
1048
+ "eval_loss": 4.498379707336426,
1049
+ "eval_runtime": 20.0355,
1050
+ "eval_samples_per_second": 12.478,
1051
+ "eval_steps_per_second": 3.144,
1052
+ "step": 700
1053
+ },
1054
+ {
1055
+ "epoch": 1.7586529466791394,
1056
+ "grad_norm": 1.3069407939910889,
1057
+ "learning_rate": 0.0003,
1058
+ "loss": 4.7338,
1059
+ "step": 705
1060
+ },
1061
+ {
1062
+ "epoch": 1.7711256626130338,
1063
+ "grad_norm": 1.1146960258483887,
1064
+ "learning_rate": 0.0003,
1065
+ "loss": 4.8758,
1066
+ "step": 710
1067
+ },
1068
+ {
1069
+ "epoch": 1.7835983785469285,
1070
+ "grad_norm": 1.0376973152160645,
1071
+ "learning_rate": 0.0003,
1072
+ "loss": 4.7604,
1073
+ "step": 715
1074
+ },
1075
+ {
1076
+ "epoch": 1.7960710944808231,
1077
+ "grad_norm": 1.2044090032577515,
1078
+ "learning_rate": 0.0003,
1079
+ "loss": 4.7472,
1080
+ "step": 720
1081
+ },
1082
+ {
1083
+ "epoch": 1.8085438104147178,
1084
+ "grad_norm": 1.0660207271575928,
1085
+ "learning_rate": 0.0003,
1086
+ "loss": 4.79,
1087
+ "step": 725
1088
+ },
1089
+ {
1090
+ "epoch": 1.8210165263486124,
1091
+ "grad_norm": 0.7932606935501099,
1092
+ "learning_rate": 0.0003,
1093
+ "loss": 4.7476,
1094
+ "step": 730
1095
+ },
1096
+ {
1097
+ "epoch": 1.833489242282507,
1098
+ "grad_norm": 0.8554738759994507,
1099
+ "learning_rate": 0.0003,
1100
+ "loss": 4.7839,
1101
+ "step": 735
1102
+ },
1103
+ {
1104
+ "epoch": 1.8459619582164017,
1105
+ "grad_norm": 1.015703797340393,
1106
+ "learning_rate": 0.0003,
1107
+ "loss": 4.7935,
1108
+ "step": 740
1109
+ },
1110
+ {
1111
+ "epoch": 1.8584346741502962,
1112
+ "grad_norm": 1.1005243062973022,
1113
+ "learning_rate": 0.0003,
1114
+ "loss": 4.7913,
1115
+ "step": 745
1116
+ },
1117
+ {
1118
+ "epoch": 1.8709073900841908,
1119
+ "grad_norm": 0.8775972127914429,
1120
+ "learning_rate": 0.0003,
1121
+ "loss": 4.5128,
1122
+ "step": 750
1123
+ },
1124
+ {
1125
+ "epoch": 1.8833801060180855,
1126
+ "grad_norm": 0.8116542100906372,
1127
+ "learning_rate": 0.0003,
1128
+ "loss": 4.6496,
1129
+ "step": 755
1130
+ },
1131
+ {
1132
+ "epoch": 1.89585282195198,
1133
+ "grad_norm": 0.7614642381668091,
1134
+ "learning_rate": 0.0003,
1135
+ "loss": 4.7695,
1136
+ "step": 760
1137
+ },
1138
+ {
1139
+ "epoch": 1.9083255378858746,
1140
+ "grad_norm": 1.0064287185668945,
1141
+ "learning_rate": 0.0003,
1142
+ "loss": 4.7929,
1143
+ "step": 765
1144
+ },
1145
+ {
1146
+ "epoch": 1.9207982538197692,
1147
+ "grad_norm": 0.7342740297317505,
1148
+ "learning_rate": 0.0003,
1149
+ "loss": 4.6711,
1150
+ "step": 770
1151
+ },
1152
+ {
1153
+ "epoch": 1.9332709697536639,
1154
+ "grad_norm": 0.9723834991455078,
1155
+ "learning_rate": 0.0003,
1156
+ "loss": 4.6212,
1157
+ "step": 775
1158
+ },
1159
+ {
1160
+ "epoch": 1.9457436856875585,
1161
+ "grad_norm": 1.20729398727417,
1162
+ "learning_rate": 0.0003,
1163
+ "loss": 4.6513,
1164
+ "step": 780
1165
+ },
1166
+ {
1167
+ "epoch": 1.9582164016214532,
1168
+ "grad_norm": 0.7920907735824585,
1169
+ "learning_rate": 0.0003,
1170
+ "loss": 4.6264,
1171
+ "step": 785
1172
+ },
1173
+ {
1174
+ "epoch": 1.9706891175553478,
1175
+ "grad_norm": 0.6307650804519653,
1176
+ "learning_rate": 0.0003,
1177
+ "loss": 4.6481,
1178
+ "step": 790
1179
+ },
1180
+ {
1181
+ "epoch": 1.9831618334892422,
1182
+ "grad_norm": 0.8942980766296387,
1183
+ "learning_rate": 0.0003,
1184
+ "loss": 4.6598,
1185
+ "step": 795
1186
+ },
1187
+ {
1188
+ "epoch": 1.995634549423137,
1189
+ "grad_norm": 0.7046281099319458,
1190
+ "learning_rate": 0.0003,
1191
+ "loss": 4.7023,
1192
+ "step": 800
1193
+ },
1194
+ {
1195
+ "epoch": 1.995634549423137,
1196
+ "eval_accuracy": 0.3789325513196481,
1197
+ "eval_loss": 4.358436584472656,
1198
+ "eval_runtime": 20.1663,
1199
+ "eval_samples_per_second": 12.397,
1200
+ "eval_steps_per_second": 3.124,
1201
+ "step": 800
1202
+ },
1203
+ {
1204
+ "epoch": 1.995634549423137,
1205
+ "step": 800,
1206
+ "total_flos": 6.441101073108173e+16,
1207
+ "train_loss": 8.280340445041656,
1208
+ "train_runtime": 18888.2342,
1209
+ "train_samples_per_second": 5.433,
1210
+ "train_steps_per_second": 0.042
1211
+ }
1212
+ ],
1213
+ "logging_steps": 5,
1214
+ "max_steps": 800,
1215
+ "num_input_tokens_seen": 0,
1216
+ "num_train_epochs": 2,
1217
+ "save_steps": 100,
1218
+ "total_flos": 6.441101073108173e+16,
1219
+ "train_batch_size": 4,
1220
+ "trial_name": null,
1221
+ "trial_params": null
1222
+ }