khaingsmon commited on
Commit
cfa9ac9
1 Parent(s): 70120f2

cheers again

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. trainer_state.json +1180 -0
README.md CHANGED
@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # whisper2
17
 
18
- This model is a fine-tuned version of [openai/whisper-tiny.en](https://huggingface.co/openai/whisper-tiny.en) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.5233
21
  - Wer: 31.1083
 
15
 
16
  # whisper2
17
 
18
+ This model is a fine-tuned version of [openai/whisper-tiny.en](https://huggingface.co/openai/whisper-tiny.en) on the tiny dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.5233
21
  - Wer: 31.1083
trainer_state.json ADDED
@@ -0,0 +1,1180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 28.05415617128463,
3
+ "best_model_checkpoint": "whisper2/checkpoint-430",
4
+ "epoch": 7.042253521126761,
5
+ "eval_steps": 10,
6
+ "global_step": 500,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.07042253521126761,
13
+ "grad_norm": 43.82194137573242,
14
+ "learning_rate": 1.0000000000000001e-07,
15
+ "loss": 3.9547,
16
+ "step": 5
17
+ },
18
+ {
19
+ "epoch": 0.14084507042253522,
20
+ "grad_norm": 45.53117370605469,
21
+ "learning_rate": 2.0000000000000002e-07,
22
+ "loss": 3.9553,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 0.14084507042253522,
27
+ "eval_loss": 3.964555501937866,
28
+ "eval_runtime": 264.1292,
29
+ "eval_samples_per_second": 1.893,
30
+ "eval_steps_per_second": 0.239,
31
+ "eval_wer": 74.87405541561712,
32
+ "step": 10
33
+ },
34
+ {
35
+ "epoch": 0.2112676056338028,
36
+ "grad_norm": 46.162776947021484,
37
+ "learning_rate": 3.0000000000000004e-07,
38
+ "loss": 3.882,
39
+ "step": 15
40
+ },
41
+ {
42
+ "epoch": 0.28169014084507044,
43
+ "grad_norm": 46.07596206665039,
44
+ "learning_rate": 4.0000000000000003e-07,
45
+ "loss": 3.9548,
46
+ "step": 20
47
+ },
48
+ {
49
+ "epoch": 0.28169014084507044,
50
+ "eval_loss": 3.8793957233428955,
51
+ "eval_runtime": 256.6948,
52
+ "eval_samples_per_second": 1.948,
53
+ "eval_steps_per_second": 0.245,
54
+ "eval_wer": 77.67632241813602,
55
+ "step": 20
56
+ },
57
+ {
58
+ "epoch": 0.352112676056338,
59
+ "grad_norm": 45.13657760620117,
60
+ "learning_rate": 5.000000000000001e-07,
61
+ "loss": 3.9469,
62
+ "step": 25
63
+ },
64
+ {
65
+ "epoch": 0.4225352112676056,
66
+ "grad_norm": 44.565940856933594,
67
+ "learning_rate": 6.000000000000001e-07,
68
+ "loss": 3.8127,
69
+ "step": 30
70
+ },
71
+ {
72
+ "epoch": 0.4225352112676056,
73
+ "eval_loss": 3.740476608276367,
74
+ "eval_runtime": 257.3378,
75
+ "eval_samples_per_second": 1.943,
76
+ "eval_steps_per_second": 0.245,
77
+ "eval_wer": 76.4168765743073,
78
+ "step": 30
79
+ },
80
+ {
81
+ "epoch": 0.49295774647887325,
82
+ "grad_norm": 44.24871826171875,
83
+ "learning_rate": 7.000000000000001e-07,
84
+ "loss": 3.7507,
85
+ "step": 35
86
+ },
87
+ {
88
+ "epoch": 0.5633802816901409,
89
+ "grad_norm": 42.1717529296875,
90
+ "learning_rate": 8.000000000000001e-07,
91
+ "loss": 3.6178,
92
+ "step": 40
93
+ },
94
+ {
95
+ "epoch": 0.5633802816901409,
96
+ "eval_loss": 3.5547332763671875,
97
+ "eval_runtime": 256.8157,
98
+ "eval_samples_per_second": 1.947,
99
+ "eval_steps_per_second": 0.245,
100
+ "eval_wer": 75.31486146095719,
101
+ "step": 40
102
+ },
103
+ {
104
+ "epoch": 0.6338028169014085,
105
+ "grad_norm": 44.667205810546875,
106
+ "learning_rate": 9.000000000000001e-07,
107
+ "loss": 3.4825,
108
+ "step": 45
109
+ },
110
+ {
111
+ "epoch": 0.704225352112676,
112
+ "grad_norm": 43.76979064941406,
113
+ "learning_rate": 1.0000000000000002e-06,
114
+ "loss": 3.3992,
115
+ "step": 50
116
+ },
117
+ {
118
+ "epoch": 0.704225352112676,
119
+ "eval_loss": 3.323503255844116,
120
+ "eval_runtime": 255.1809,
121
+ "eval_samples_per_second": 1.959,
122
+ "eval_steps_per_second": 0.247,
123
+ "eval_wer": 70.27707808564232,
124
+ "step": 50
125
+ },
126
+ {
127
+ "epoch": 0.7746478873239436,
128
+ "grad_norm": 41.28179168701172,
129
+ "learning_rate": 1.1e-06,
130
+ "loss": 3.3124,
131
+ "step": 55
132
+ },
133
+ {
134
+ "epoch": 0.8450704225352113,
135
+ "grad_norm": 40.813392639160156,
136
+ "learning_rate": 1.2000000000000002e-06,
137
+ "loss": 3.1416,
138
+ "step": 60
139
+ },
140
+ {
141
+ "epoch": 0.8450704225352113,
142
+ "eval_loss": 3.040179491043091,
143
+ "eval_runtime": 255.4069,
144
+ "eval_samples_per_second": 1.958,
145
+ "eval_steps_per_second": 0.247,
146
+ "eval_wer": 67.85264483627203,
147
+ "step": 60
148
+ },
149
+ {
150
+ "epoch": 0.9154929577464789,
151
+ "grad_norm": 40.00282287597656,
152
+ "learning_rate": 1.3e-06,
153
+ "loss": 2.88,
154
+ "step": 65
155
+ },
156
+ {
157
+ "epoch": 0.9859154929577465,
158
+ "grad_norm": 40.60588455200195,
159
+ "learning_rate": 1.4000000000000001e-06,
160
+ "loss": 2.8052,
161
+ "step": 70
162
+ },
163
+ {
164
+ "epoch": 0.9859154929577465,
165
+ "eval_loss": 2.6852359771728516,
166
+ "eval_runtime": 254.3541,
167
+ "eval_samples_per_second": 1.966,
168
+ "eval_steps_per_second": 0.248,
169
+ "eval_wer": 65.96347607052897,
170
+ "step": 70
171
+ },
172
+ {
173
+ "epoch": 1.056338028169014,
174
+ "grad_norm": 44.205726623535156,
175
+ "learning_rate": 1.5e-06,
176
+ "loss": 2.4894,
177
+ "step": 75
178
+ },
179
+ {
180
+ "epoch": 1.1267605633802817,
181
+ "grad_norm": 40.45851516723633,
182
+ "learning_rate": 1.6000000000000001e-06,
183
+ "loss": 2.3513,
184
+ "step": 80
185
+ },
186
+ {
187
+ "epoch": 1.1267605633802817,
188
+ "eval_loss": 2.223541021347046,
189
+ "eval_runtime": 256.3144,
190
+ "eval_samples_per_second": 1.951,
191
+ "eval_steps_per_second": 0.246,
192
+ "eval_wer": 68.3249370277078,
193
+ "step": 80
194
+ },
195
+ {
196
+ "epoch": 1.1971830985915493,
197
+ "grad_norm": 37.049591064453125,
198
+ "learning_rate": 1.7000000000000002e-06,
199
+ "loss": 2.2021,
200
+ "step": 85
201
+ },
202
+ {
203
+ "epoch": 1.267605633802817,
204
+ "grad_norm": 32.15092468261719,
205
+ "learning_rate": 1.8000000000000001e-06,
206
+ "loss": 1.893,
207
+ "step": 90
208
+ },
209
+ {
210
+ "epoch": 1.267605633802817,
211
+ "eval_loss": 1.6707711219787598,
212
+ "eval_runtime": 254.2495,
213
+ "eval_samples_per_second": 1.967,
214
+ "eval_steps_per_second": 0.248,
215
+ "eval_wer": 63.822418136020154,
216
+ "step": 90
217
+ },
218
+ {
219
+ "epoch": 1.3380281690140845,
220
+ "grad_norm": 29.11300277709961,
221
+ "learning_rate": 1.9000000000000002e-06,
222
+ "loss": 1.6227,
223
+ "step": 95
224
+ },
225
+ {
226
+ "epoch": 1.408450704225352,
227
+ "grad_norm": 19.466663360595703,
228
+ "learning_rate": 2.0000000000000003e-06,
229
+ "loss": 1.2871,
230
+ "step": 100
231
+ },
232
+ {
233
+ "epoch": 1.408450704225352,
234
+ "eval_loss": 1.164486050605774,
235
+ "eval_runtime": 254.5126,
236
+ "eval_samples_per_second": 1.965,
237
+ "eval_steps_per_second": 0.248,
238
+ "eval_wer": 63.25566750629723,
239
+ "step": 100
240
+ },
241
+ {
242
+ "epoch": 1.4788732394366197,
243
+ "grad_norm": 15.238794326782227,
244
+ "learning_rate": 2.1000000000000002e-06,
245
+ "loss": 1.09,
246
+ "step": 105
247
+ },
248
+ {
249
+ "epoch": 1.5492957746478875,
250
+ "grad_norm": 10.725071907043457,
251
+ "learning_rate": 2.2e-06,
252
+ "loss": 0.9146,
253
+ "step": 110
254
+ },
255
+ {
256
+ "epoch": 1.5492957746478875,
257
+ "eval_loss": 0.8784648776054382,
258
+ "eval_runtime": 256.185,
259
+ "eval_samples_per_second": 1.952,
260
+ "eval_steps_per_second": 0.246,
261
+ "eval_wer": 56.83249370277078,
262
+ "step": 110
263
+ },
264
+ {
265
+ "epoch": 1.619718309859155,
266
+ "grad_norm": 7.8202009201049805,
267
+ "learning_rate": 2.3000000000000004e-06,
268
+ "loss": 0.8882,
269
+ "step": 115
270
+ },
271
+ {
272
+ "epoch": 1.6901408450704225,
273
+ "grad_norm": 8.60835075378418,
274
+ "learning_rate": 2.4000000000000003e-06,
275
+ "loss": 0.8044,
276
+ "step": 120
277
+ },
278
+ {
279
+ "epoch": 1.6901408450704225,
280
+ "eval_loss": 0.7906607985496521,
281
+ "eval_runtime": 255.9613,
282
+ "eval_samples_per_second": 1.953,
283
+ "eval_steps_per_second": 0.246,
284
+ "eval_wer": 46.977329974811084,
285
+ "step": 120
286
+ },
287
+ {
288
+ "epoch": 1.76056338028169,
289
+ "grad_norm": 9.780821800231934,
290
+ "learning_rate": 2.5e-06,
291
+ "loss": 0.6849,
292
+ "step": 125
293
+ },
294
+ {
295
+ "epoch": 1.8309859154929577,
296
+ "grad_norm": 9.33056926727295,
297
+ "learning_rate": 2.6e-06,
298
+ "loss": 0.6634,
299
+ "step": 130
300
+ },
301
+ {
302
+ "epoch": 1.8309859154929577,
303
+ "eval_loss": 0.7425487637519836,
304
+ "eval_runtime": 255.5846,
305
+ "eval_samples_per_second": 1.956,
306
+ "eval_steps_per_second": 0.246,
307
+ "eval_wer": 47.48110831234257,
308
+ "step": 130
309
+ },
310
+ {
311
+ "epoch": 1.9014084507042255,
312
+ "grad_norm": 8.966361999511719,
313
+ "learning_rate": 2.7000000000000004e-06,
314
+ "loss": 0.7421,
315
+ "step": 135
316
+ },
317
+ {
318
+ "epoch": 1.971830985915493,
319
+ "grad_norm": 7.636435031890869,
320
+ "learning_rate": 2.8000000000000003e-06,
321
+ "loss": 0.6722,
322
+ "step": 140
323
+ },
324
+ {
325
+ "epoch": 1.971830985915493,
326
+ "eval_loss": 0.7099979519844055,
327
+ "eval_runtime": 253.8483,
328
+ "eval_samples_per_second": 1.97,
329
+ "eval_steps_per_second": 0.248,
330
+ "eval_wer": 45.90680100755667,
331
+ "step": 140
332
+ },
333
+ {
334
+ "epoch": 2.0422535211267605,
335
+ "grad_norm": 8.085705757141113,
336
+ "learning_rate": 2.9e-06,
337
+ "loss": 0.6865,
338
+ "step": 145
339
+ },
340
+ {
341
+ "epoch": 2.112676056338028,
342
+ "grad_norm": 8.131012916564941,
343
+ "learning_rate": 3e-06,
344
+ "loss": 0.6823,
345
+ "step": 150
346
+ },
347
+ {
348
+ "epoch": 2.112676056338028,
349
+ "eval_loss": 0.6854478120803833,
350
+ "eval_runtime": 255.8245,
351
+ "eval_samples_per_second": 1.954,
352
+ "eval_steps_per_second": 0.246,
353
+ "eval_wer": 42.41183879093199,
354
+ "step": 150
355
+ },
356
+ {
357
+ "epoch": 2.183098591549296,
358
+ "grad_norm": 8.054609298706055,
359
+ "learning_rate": 3.1000000000000004e-06,
360
+ "loss": 0.6001,
361
+ "step": 155
362
+ },
363
+ {
364
+ "epoch": 2.2535211267605635,
365
+ "grad_norm": 6.9759063720703125,
366
+ "learning_rate": 3.2000000000000003e-06,
367
+ "loss": 0.5802,
368
+ "step": 160
369
+ },
370
+ {
371
+ "epoch": 2.2535211267605635,
372
+ "eval_loss": 0.6659273505210876,
373
+ "eval_runtime": 254.855,
374
+ "eval_samples_per_second": 1.962,
375
+ "eval_steps_per_second": 0.247,
376
+ "eval_wer": 40.42821158690176,
377
+ "step": 160
378
+ },
379
+ {
380
+ "epoch": 2.323943661971831,
381
+ "grad_norm": 8.077522277832031,
382
+ "learning_rate": 3.3000000000000006e-06,
383
+ "loss": 0.6065,
384
+ "step": 165
385
+ },
386
+ {
387
+ "epoch": 2.3943661971830985,
388
+ "grad_norm": 6.6878228187561035,
389
+ "learning_rate": 3.4000000000000005e-06,
390
+ "loss": 0.6084,
391
+ "step": 170
392
+ },
393
+ {
394
+ "epoch": 2.3943661971830985,
395
+ "eval_loss": 0.6503352522850037,
396
+ "eval_runtime": 253.7567,
397
+ "eval_samples_per_second": 1.97,
398
+ "eval_steps_per_second": 0.248,
399
+ "eval_wer": 40.8375314861461,
400
+ "step": 170
401
+ },
402
+ {
403
+ "epoch": 2.464788732394366,
404
+ "grad_norm": 7.941697597503662,
405
+ "learning_rate": 3.5e-06,
406
+ "loss": 0.5972,
407
+ "step": 175
408
+ },
409
+ {
410
+ "epoch": 2.535211267605634,
411
+ "grad_norm": 7.986533164978027,
412
+ "learning_rate": 3.6000000000000003e-06,
413
+ "loss": 0.6038,
414
+ "step": 180
415
+ },
416
+ {
417
+ "epoch": 2.535211267605634,
418
+ "eval_loss": 0.6345599889755249,
419
+ "eval_runtime": 254.9306,
420
+ "eval_samples_per_second": 1.961,
421
+ "eval_steps_per_second": 0.247,
422
+ "eval_wer": 41.49874055415617,
423
+ "step": 180
424
+ },
425
+ {
426
+ "epoch": 2.6056338028169015,
427
+ "grad_norm": 6.744418144226074,
428
+ "learning_rate": 3.7e-06,
429
+ "loss": 0.5007,
430
+ "step": 185
431
+ },
432
+ {
433
+ "epoch": 2.676056338028169,
434
+ "grad_norm": 6.323821544647217,
435
+ "learning_rate": 3.8000000000000005e-06,
436
+ "loss": 0.5095,
437
+ "step": 190
438
+ },
439
+ {
440
+ "epoch": 2.676056338028169,
441
+ "eval_loss": 0.6247134804725647,
442
+ "eval_runtime": 257.1561,
443
+ "eval_samples_per_second": 1.944,
444
+ "eval_steps_per_second": 0.245,
445
+ "eval_wer": 42.03400503778337,
446
+ "step": 190
447
+ },
448
+ {
449
+ "epoch": 2.7464788732394365,
450
+ "grad_norm": 6.979465961456299,
451
+ "learning_rate": 3.900000000000001e-06,
452
+ "loss": 0.5943,
453
+ "step": 195
454
+ },
455
+ {
456
+ "epoch": 2.816901408450704,
457
+ "grad_norm": 6.675357818603516,
458
+ "learning_rate": 4.000000000000001e-06,
459
+ "loss": 0.5251,
460
+ "step": 200
461
+ },
462
+ {
463
+ "epoch": 2.816901408450704,
464
+ "eval_loss": 0.6154741644859314,
465
+ "eval_runtime": 255.2235,
466
+ "eval_samples_per_second": 1.959,
467
+ "eval_steps_per_second": 0.247,
468
+ "eval_wer": 39.357682619647356,
469
+ "step": 200
470
+ },
471
+ {
472
+ "epoch": 2.887323943661972,
473
+ "grad_norm": 6.802981853485107,
474
+ "learning_rate": 4.1e-06,
475
+ "loss": 0.5528,
476
+ "step": 205
477
+ },
478
+ {
479
+ "epoch": 2.9577464788732395,
480
+ "grad_norm": 6.836462497711182,
481
+ "learning_rate": 4.2000000000000004e-06,
482
+ "loss": 0.5699,
483
+ "step": 210
484
+ },
485
+ {
486
+ "epoch": 2.9577464788732395,
487
+ "eval_loss": 0.6045908331871033,
488
+ "eval_runtime": 254.5675,
489
+ "eval_samples_per_second": 1.964,
490
+ "eval_steps_per_second": 0.247,
491
+ "eval_wer": 38.350125944584384,
492
+ "step": 210
493
+ },
494
+ {
495
+ "epoch": 3.028169014084507,
496
+ "grad_norm": 6.114952087402344,
497
+ "learning_rate": 4.3e-06,
498
+ "loss": 0.478,
499
+ "step": 215
500
+ },
501
+ {
502
+ "epoch": 3.0985915492957745,
503
+ "grad_norm": 5.803236961364746,
504
+ "learning_rate": 4.4e-06,
505
+ "loss": 0.4839,
506
+ "step": 220
507
+ },
508
+ {
509
+ "epoch": 3.0985915492957745,
510
+ "eval_loss": 0.5944731831550598,
511
+ "eval_runtime": 254.5629,
512
+ "eval_samples_per_second": 1.964,
513
+ "eval_steps_per_second": 0.247,
514
+ "eval_wer": 37.27959697732997,
515
+ "step": 220
516
+ },
517
+ {
518
+ "epoch": 3.169014084507042,
519
+ "grad_norm": 5.95841646194458,
520
+ "learning_rate": 4.5e-06,
521
+ "loss": 0.4982,
522
+ "step": 225
523
+ },
524
+ {
525
+ "epoch": 3.23943661971831,
526
+ "grad_norm": 6.992792129516602,
527
+ "learning_rate": 4.600000000000001e-06,
528
+ "loss": 0.4843,
529
+ "step": 230
530
+ },
531
+ {
532
+ "epoch": 3.23943661971831,
533
+ "eval_loss": 0.5861312747001648,
534
+ "eval_runtime": 257.6573,
535
+ "eval_samples_per_second": 1.941,
536
+ "eval_steps_per_second": 0.245,
537
+ "eval_wer": 48.394206549118394,
538
+ "step": 230
539
+ },
540
+ {
541
+ "epoch": 3.3098591549295775,
542
+ "grad_norm": 5.872804164886475,
543
+ "learning_rate": 4.7e-06,
544
+ "loss": 0.4471,
545
+ "step": 235
546
+ },
547
+ {
548
+ "epoch": 3.380281690140845,
549
+ "grad_norm": 6.013182640075684,
550
+ "learning_rate": 4.800000000000001e-06,
551
+ "loss": 0.4538,
552
+ "step": 240
553
+ },
554
+ {
555
+ "epoch": 3.380281690140845,
556
+ "eval_loss": 0.5793710350990295,
557
+ "eval_runtime": 254.563,
558
+ "eval_samples_per_second": 1.964,
559
+ "eval_steps_per_second": 0.247,
560
+ "eval_wer": 34.66624685138539,
561
+ "step": 240
562
+ },
563
+ {
564
+ "epoch": 3.4507042253521125,
565
+ "grad_norm": 6.745495319366455,
566
+ "learning_rate": 4.9000000000000005e-06,
567
+ "loss": 0.4932,
568
+ "step": 245
569
+ },
570
+ {
571
+ "epoch": 3.52112676056338,
572
+ "grad_norm": 5.320774078369141,
573
+ "learning_rate": 5e-06,
574
+ "loss": 0.4741,
575
+ "step": 250
576
+ },
577
+ {
578
+ "epoch": 3.52112676056338,
579
+ "eval_loss": 0.5736850500106812,
580
+ "eval_runtime": 255.3883,
581
+ "eval_samples_per_second": 1.958,
582
+ "eval_steps_per_second": 0.247,
583
+ "eval_wer": 33.816120906801004,
584
+ "step": 250
585
+ },
586
+ {
587
+ "epoch": 3.591549295774648,
588
+ "grad_norm": 6.753683090209961,
589
+ "learning_rate": 5.1e-06,
590
+ "loss": 0.5025,
591
+ "step": 255
592
+ },
593
+ {
594
+ "epoch": 3.6619718309859155,
595
+ "grad_norm": 7.474066257476807,
596
+ "learning_rate": 5.2e-06,
597
+ "loss": 0.4542,
598
+ "step": 260
599
+ },
600
+ {
601
+ "epoch": 3.6619718309859155,
602
+ "eval_loss": 0.5662725567817688,
603
+ "eval_runtime": 255.3299,
604
+ "eval_samples_per_second": 1.958,
605
+ "eval_steps_per_second": 0.247,
606
+ "eval_wer": 41.97103274559194,
607
+ "step": 260
608
+ },
609
+ {
610
+ "epoch": 3.732394366197183,
611
+ "grad_norm": 5.626581192016602,
612
+ "learning_rate": 5.300000000000001e-06,
613
+ "loss": 0.4639,
614
+ "step": 265
615
+ },
616
+ {
617
+ "epoch": 3.802816901408451,
618
+ "grad_norm": 5.518383026123047,
619
+ "learning_rate": 5.400000000000001e-06,
620
+ "loss": 0.4163,
621
+ "step": 270
622
+ },
623
+ {
624
+ "epoch": 3.802816901408451,
625
+ "eval_loss": 0.5622957944869995,
626
+ "eval_runtime": 256.1828,
627
+ "eval_samples_per_second": 1.952,
628
+ "eval_steps_per_second": 0.246,
629
+ "eval_wer": 46.095717884130984,
630
+ "step": 270
631
+ },
632
+ {
633
+ "epoch": 3.873239436619718,
634
+ "grad_norm": 6.132260799407959,
635
+ "learning_rate": 5.500000000000001e-06,
636
+ "loss": 0.3922,
637
+ "step": 275
638
+ },
639
+ {
640
+ "epoch": 3.943661971830986,
641
+ "grad_norm": 5.8338942527771,
642
+ "learning_rate": 5.600000000000001e-06,
643
+ "loss": 0.3496,
644
+ "step": 280
645
+ },
646
+ {
647
+ "epoch": 3.943661971830986,
648
+ "eval_loss": 0.560535192489624,
649
+ "eval_runtime": 255.0016,
650
+ "eval_samples_per_second": 1.961,
651
+ "eval_steps_per_second": 0.247,
652
+ "eval_wer": 42.2544080604534,
653
+ "step": 280
654
+ },
655
+ {
656
+ "epoch": 4.014084507042254,
657
+ "grad_norm": 4.769192695617676,
658
+ "learning_rate": 5.7e-06,
659
+ "loss": 0.4389,
660
+ "step": 285
661
+ },
662
+ {
663
+ "epoch": 4.084507042253521,
664
+ "grad_norm": 5.79905366897583,
665
+ "learning_rate": 5.8e-06,
666
+ "loss": 0.3835,
667
+ "step": 290
668
+ },
669
+ {
670
+ "epoch": 4.084507042253521,
671
+ "eval_loss": 0.5556859374046326,
672
+ "eval_runtime": 255.3987,
673
+ "eval_samples_per_second": 1.958,
674
+ "eval_steps_per_second": 0.247,
675
+ "eval_wer": 41.656171284634766,
676
+ "step": 290
677
+ },
678
+ {
679
+ "epoch": 4.154929577464789,
680
+ "grad_norm": 5.353799819946289,
681
+ "learning_rate": 5.9e-06,
682
+ "loss": 0.385,
683
+ "step": 295
684
+ },
685
+ {
686
+ "epoch": 4.225352112676056,
687
+ "grad_norm": 5.164504528045654,
688
+ "learning_rate": 6e-06,
689
+ "loss": 0.3462,
690
+ "step": 300
691
+ },
692
+ {
693
+ "epoch": 4.225352112676056,
694
+ "eval_loss": 0.550672173500061,
695
+ "eval_runtime": 255.5806,
696
+ "eval_samples_per_second": 1.956,
697
+ "eval_steps_per_second": 0.246,
698
+ "eval_wer": 36.39798488664987,
699
+ "step": 300
700
+ },
701
+ {
702
+ "epoch": 4.295774647887324,
703
+ "grad_norm": 5.903466701507568,
704
+ "learning_rate": 6.1e-06,
705
+ "loss": 0.3733,
706
+ "step": 305
707
+ },
708
+ {
709
+ "epoch": 4.366197183098592,
710
+ "grad_norm": 6.308957099914551,
711
+ "learning_rate": 6.200000000000001e-06,
712
+ "loss": 0.3133,
713
+ "step": 310
714
+ },
715
+ {
716
+ "epoch": 4.366197183098592,
717
+ "eval_loss": 0.5452054738998413,
718
+ "eval_runtime": 255.9204,
719
+ "eval_samples_per_second": 1.954,
720
+ "eval_steps_per_second": 0.246,
721
+ "eval_wer": 42.56926952141058,
722
+ "step": 310
723
+ },
724
+ {
725
+ "epoch": 4.436619718309859,
726
+ "grad_norm": 4.767759323120117,
727
+ "learning_rate": 6.300000000000001e-06,
728
+ "loss": 0.3544,
729
+ "step": 315
730
+ },
731
+ {
732
+ "epoch": 4.507042253521127,
733
+ "grad_norm": 5.711643695831299,
734
+ "learning_rate": 6.4000000000000006e-06,
735
+ "loss": 0.3638,
736
+ "step": 320
737
+ },
738
+ {
739
+ "epoch": 4.507042253521127,
740
+ "eval_loss": 0.5434854030609131,
741
+ "eval_runtime": 253.7024,
742
+ "eval_samples_per_second": 1.971,
743
+ "eval_steps_per_second": 0.248,
744
+ "eval_wer": 35.957178841309826,
745
+ "step": 320
746
+ },
747
+ {
748
+ "epoch": 4.577464788732394,
749
+ "grad_norm": 5.667789936065674,
750
+ "learning_rate": 6.5000000000000004e-06,
751
+ "loss": 0.3974,
752
+ "step": 325
753
+ },
754
+ {
755
+ "epoch": 4.647887323943662,
756
+ "grad_norm": 6.108503341674805,
757
+ "learning_rate": 6.600000000000001e-06,
758
+ "loss": 0.3826,
759
+ "step": 330
760
+ },
761
+ {
762
+ "epoch": 4.647887323943662,
763
+ "eval_loss": 0.5396420955657959,
764
+ "eval_runtime": 252.7138,
765
+ "eval_samples_per_second": 1.979,
766
+ "eval_steps_per_second": 0.249,
767
+ "eval_wer": 31.95843828715365,
768
+ "step": 330
769
+ },
770
+ {
771
+ "epoch": 4.71830985915493,
772
+ "grad_norm": 5.889377117156982,
773
+ "learning_rate": 6.700000000000001e-06,
774
+ "loss": 0.3813,
775
+ "step": 335
776
+ },
777
+ {
778
+ "epoch": 4.788732394366197,
779
+ "grad_norm": 5.469658851623535,
780
+ "learning_rate": 6.800000000000001e-06,
781
+ "loss": 0.3581,
782
+ "step": 340
783
+ },
784
+ {
785
+ "epoch": 4.788732394366197,
786
+ "eval_loss": 0.5361477136611938,
787
+ "eval_runtime": 251.8728,
788
+ "eval_samples_per_second": 1.985,
789
+ "eval_steps_per_second": 0.25,
790
+ "eval_wer": 33.78463476070529,
791
+ "step": 340
792
+ },
793
+ {
794
+ "epoch": 4.859154929577465,
795
+ "grad_norm": 5.188804626464844,
796
+ "learning_rate": 6.9e-06,
797
+ "loss": 0.3351,
798
+ "step": 345
799
+ },
800
+ {
801
+ "epoch": 4.929577464788732,
802
+ "grad_norm": 5.103167533874512,
803
+ "learning_rate": 7e-06,
804
+ "loss": 0.3127,
805
+ "step": 350
806
+ },
807
+ {
808
+ "epoch": 4.929577464788732,
809
+ "eval_loss": 0.5339432954788208,
810
+ "eval_runtime": 252.7571,
811
+ "eval_samples_per_second": 1.978,
812
+ "eval_steps_per_second": 0.249,
813
+ "eval_wer": 37.342569269521405,
814
+ "step": 350
815
+ },
816
+ {
817
+ "epoch": 5.0,
818
+ "grad_norm": 9.485374450683594,
819
+ "learning_rate": 7.100000000000001e-06,
820
+ "loss": 0.3265,
821
+ "step": 355
822
+ },
823
+ {
824
+ "epoch": 5.070422535211268,
825
+ "grad_norm": 5.010895252227783,
826
+ "learning_rate": 7.2000000000000005e-06,
827
+ "loss": 0.2988,
828
+ "step": 360
829
+ },
830
+ {
831
+ "epoch": 5.070422535211268,
832
+ "eval_loss": 0.5347580909729004,
833
+ "eval_runtime": 253.3761,
834
+ "eval_samples_per_second": 1.973,
835
+ "eval_steps_per_second": 0.249,
836
+ "eval_wer": 38.727959697733,
837
+ "step": 360
838
+ },
839
+ {
840
+ "epoch": 5.140845070422535,
841
+ "grad_norm": 5.113419055938721,
842
+ "learning_rate": 7.3e-06,
843
+ "loss": 0.2953,
844
+ "step": 365
845
+ },
846
+ {
847
+ "epoch": 5.211267605633803,
848
+ "grad_norm": 5.5772247314453125,
849
+ "learning_rate": 7.4e-06,
850
+ "loss": 0.2807,
851
+ "step": 370
852
+ },
853
+ {
854
+ "epoch": 5.211267605633803,
855
+ "eval_loss": 0.5343714952468872,
856
+ "eval_runtime": 252.932,
857
+ "eval_samples_per_second": 1.977,
858
+ "eval_steps_per_second": 0.249,
859
+ "eval_wer": 35.51637279596977,
860
+ "step": 370
861
+ },
862
+ {
863
+ "epoch": 5.28169014084507,
864
+ "grad_norm": 5.650921821594238,
865
+ "learning_rate": 7.500000000000001e-06,
866
+ "loss": 0.3147,
867
+ "step": 375
868
+ },
869
+ {
870
+ "epoch": 5.352112676056338,
871
+ "grad_norm": 5.143499374389648,
872
+ "learning_rate": 7.600000000000001e-06,
873
+ "loss": 0.2612,
874
+ "step": 380
875
+ },
876
+ {
877
+ "epoch": 5.352112676056338,
878
+ "eval_loss": 0.5304917097091675,
879
+ "eval_runtime": 252.0942,
880
+ "eval_samples_per_second": 1.983,
881
+ "eval_steps_per_second": 0.25,
882
+ "eval_wer": 34.66624685138539,
883
+ "step": 380
884
+ },
885
+ {
886
+ "epoch": 5.422535211267606,
887
+ "grad_norm": 5.593881607055664,
888
+ "learning_rate": 7.7e-06,
889
+ "loss": 0.2606,
890
+ "step": 385
891
+ },
892
+ {
893
+ "epoch": 5.492957746478873,
894
+ "grad_norm": 5.4485392570495605,
895
+ "learning_rate": 7.800000000000002e-06,
896
+ "loss": 0.2762,
897
+ "step": 390
898
+ },
899
+ {
900
+ "epoch": 5.492957746478873,
901
+ "eval_loss": 0.5305802226066589,
902
+ "eval_runtime": 252.0179,
903
+ "eval_samples_per_second": 1.984,
904
+ "eval_steps_per_second": 0.25,
905
+ "eval_wer": 32.27329974811083,
906
+ "step": 390
907
+ },
908
+ {
909
+ "epoch": 5.563380281690141,
910
+ "grad_norm": 4.250403881072998,
911
+ "learning_rate": 7.9e-06,
912
+ "loss": 0.2609,
913
+ "step": 395
914
+ },
915
+ {
916
+ "epoch": 5.633802816901408,
917
+ "grad_norm": 5.564484596252441,
918
+ "learning_rate": 8.000000000000001e-06,
919
+ "loss": 0.299,
920
+ "step": 400
921
+ },
922
+ {
923
+ "epoch": 5.633802816901408,
924
+ "eval_loss": 0.5266876220703125,
925
+ "eval_runtime": 251.1581,
926
+ "eval_samples_per_second": 1.991,
927
+ "eval_steps_per_second": 0.251,
928
+ "eval_wer": 36.87027707808564,
929
+ "step": 400
930
+ },
931
+ {
932
+ "epoch": 5.704225352112676,
933
+ "grad_norm": 4.646668910980225,
934
+ "learning_rate": 8.1e-06,
935
+ "loss": 0.2368,
936
+ "step": 405
937
+ },
938
+ {
939
+ "epoch": 5.774647887323944,
940
+ "grad_norm": 5.00687313079834,
941
+ "learning_rate": 8.2e-06,
942
+ "loss": 0.2718,
943
+ "step": 410
944
+ },
945
+ {
946
+ "epoch": 5.774647887323944,
947
+ "eval_loss": 0.5231830477714539,
948
+ "eval_runtime": 252.6711,
949
+ "eval_samples_per_second": 1.979,
950
+ "eval_steps_per_second": 0.249,
951
+ "eval_wer": 41.68765743073048,
952
+ "step": 410
953
+ },
954
+ {
955
+ "epoch": 5.845070422535211,
956
+ "grad_norm": 4.078917503356934,
957
+ "learning_rate": 8.3e-06,
958
+ "loss": 0.252,
959
+ "step": 415
960
+ },
961
+ {
962
+ "epoch": 5.915492957746479,
963
+ "grad_norm": 4.877511501312256,
964
+ "learning_rate": 8.400000000000001e-06,
965
+ "loss": 0.2618,
966
+ "step": 420
967
+ },
968
+ {
969
+ "epoch": 5.915492957746479,
970
+ "eval_loss": 0.5207710266113281,
971
+ "eval_runtime": 252.1118,
972
+ "eval_samples_per_second": 1.983,
973
+ "eval_steps_per_second": 0.25,
974
+ "eval_wer": 34.09949622166247,
975
+ "step": 420
976
+ },
977
+ {
978
+ "epoch": 5.985915492957746,
979
+ "grad_norm": 5.141012191772461,
980
+ "learning_rate": 8.5e-06,
981
+ "loss": 0.3232,
982
+ "step": 425
983
+ },
984
+ {
985
+ "epoch": 6.056338028169014,
986
+ "grad_norm": 4.299196243286133,
987
+ "learning_rate": 8.6e-06,
988
+ "loss": 0.2121,
989
+ "step": 430
990
+ },
991
+ {
992
+ "epoch": 6.056338028169014,
993
+ "eval_loss": 0.5220197439193726,
994
+ "eval_runtime": 252.0399,
995
+ "eval_samples_per_second": 1.984,
996
+ "eval_steps_per_second": 0.25,
997
+ "eval_wer": 28.05415617128463,
998
+ "step": 430
999
+ },
1000
+ {
1001
+ "epoch": 6.126760563380282,
1002
+ "grad_norm": 3.769075393676758,
1003
+ "learning_rate": 8.700000000000001e-06,
1004
+ "loss": 0.2119,
1005
+ "step": 435
1006
+ },
1007
+ {
1008
+ "epoch": 6.197183098591549,
1009
+ "grad_norm": 4.311405181884766,
1010
+ "learning_rate": 8.8e-06,
1011
+ "loss": 0.1929,
1012
+ "step": 440
1013
+ },
1014
+ {
1015
+ "epoch": 6.197183098591549,
1016
+ "eval_loss": 0.5256190299987793,
1017
+ "eval_runtime": 253.2092,
1018
+ "eval_samples_per_second": 1.975,
1019
+ "eval_steps_per_second": 0.249,
1020
+ "eval_wer": 35.79974811083124,
1021
+ "step": 440
1022
+ },
1023
+ {
1024
+ "epoch": 6.267605633802817,
1025
+ "grad_norm": 3.735041618347168,
1026
+ "learning_rate": 8.900000000000001e-06,
1027
+ "loss": 0.2104,
1028
+ "step": 445
1029
+ },
1030
+ {
1031
+ "epoch": 6.338028169014084,
1032
+ "grad_norm": 6.507180690765381,
1033
+ "learning_rate": 9e-06,
1034
+ "loss": 0.2504,
1035
+ "step": 450
1036
+ },
1037
+ {
1038
+ "epoch": 6.338028169014084,
1039
+ "eval_loss": 0.529583215713501,
1040
+ "eval_runtime": 252.3402,
1041
+ "eval_samples_per_second": 1.981,
1042
+ "eval_steps_per_second": 0.25,
1043
+ "eval_wer": 32.87153652392947,
1044
+ "step": 450
1045
+ },
1046
+ {
1047
+ "epoch": 6.408450704225352,
1048
+ "grad_norm": 4.1670355796813965,
1049
+ "learning_rate": 9.100000000000001e-06,
1050
+ "loss": 0.1931,
1051
+ "step": 455
1052
+ },
1053
+ {
1054
+ "epoch": 6.47887323943662,
1055
+ "grad_norm": 4.260618209838867,
1056
+ "learning_rate": 9.200000000000002e-06,
1057
+ "loss": 0.2064,
1058
+ "step": 460
1059
+ },
1060
+ {
1061
+ "epoch": 6.47887323943662,
1062
+ "eval_loss": 0.5265011191368103,
1063
+ "eval_runtime": 253.5935,
1064
+ "eval_samples_per_second": 1.972,
1065
+ "eval_steps_per_second": 0.248,
1066
+ "eval_wer": 35.3904282115869,
1067
+ "step": 460
1068
+ },
1069
+ {
1070
+ "epoch": 6.549295774647887,
1071
+ "grad_norm": 4.580427169799805,
1072
+ "learning_rate": 9.3e-06,
1073
+ "loss": 0.2099,
1074
+ "step": 465
1075
+ },
1076
+ {
1077
+ "epoch": 6.619718309859155,
1078
+ "grad_norm": 5.135242938995361,
1079
+ "learning_rate": 9.4e-06,
1080
+ "loss": 0.2044,
1081
+ "step": 470
1082
+ },
1083
+ {
1084
+ "epoch": 6.619718309859155,
1085
+ "eval_loss": 0.5266779065132141,
1086
+ "eval_runtime": 253.6172,
1087
+ "eval_samples_per_second": 1.971,
1088
+ "eval_steps_per_second": 0.248,
1089
+ "eval_wer": 38.31863979848866,
1090
+ "step": 470
1091
+ },
1092
+ {
1093
+ "epoch": 6.690140845070422,
1094
+ "grad_norm": 4.770451545715332,
1095
+ "learning_rate": 9.5e-06,
1096
+ "loss": 0.2118,
1097
+ "step": 475
1098
+ },
1099
+ {
1100
+ "epoch": 6.76056338028169,
1101
+ "grad_norm": 4.276612758636475,
1102
+ "learning_rate": 9.600000000000001e-06,
1103
+ "loss": 0.1844,
1104
+ "step": 480
1105
+ },
1106
+ {
1107
+ "epoch": 6.76056338028169,
1108
+ "eval_loss": 0.5231460332870483,
1109
+ "eval_runtime": 253.7339,
1110
+ "eval_samples_per_second": 1.971,
1111
+ "eval_steps_per_second": 0.248,
1112
+ "eval_wer": 35.107052896725435,
1113
+ "step": 480
1114
+ },
1115
+ {
1116
+ "epoch": 6.830985915492958,
1117
+ "grad_norm": 6.741299152374268,
1118
+ "learning_rate": 9.7e-06,
1119
+ "loss": 0.2276,
1120
+ "step": 485
1121
+ },
1122
+ {
1123
+ "epoch": 6.901408450704225,
1124
+ "grad_norm": 5.4448370933532715,
1125
+ "learning_rate": 9.800000000000001e-06,
1126
+ "loss": 0.1867,
1127
+ "step": 490
1128
+ },
1129
+ {
1130
+ "epoch": 6.901408450704225,
1131
+ "eval_loss": 0.5235409140586853,
1132
+ "eval_runtime": 252.1039,
1133
+ "eval_samples_per_second": 1.983,
1134
+ "eval_steps_per_second": 0.25,
1135
+ "eval_wer": 31.580604534005037,
1136
+ "step": 490
1137
+ },
1138
+ {
1139
+ "epoch": 6.971830985915493,
1140
+ "grad_norm": 5.26415491104126,
1141
+ "learning_rate": 9.9e-06,
1142
+ "loss": 0.2232,
1143
+ "step": 495
1144
+ },
1145
+ {
1146
+ "epoch": 7.042253521126761,
1147
+ "grad_norm": 3.9737112522125244,
1148
+ "learning_rate": 0.0,
1149
+ "loss": 0.1562,
1150
+ "step": 500
1151
+ },
1152
+ {
1153
+ "epoch": 7.042253521126761,
1154
+ "eval_loss": 0.5233400464057922,
1155
+ "eval_runtime": 252.8742,
1156
+ "eval_samples_per_second": 1.977,
1157
+ "eval_steps_per_second": 0.249,
1158
+ "eval_wer": 31.10831234256927,
1159
+ "step": 500
1160
+ },
1161
+ {
1162
+ "epoch": 7.042253521126761,
1163
+ "step": 500,
1164
+ "total_flos": 7.8022170722304e+17,
1165
+ "train_loss": 0.9523631989955902,
1166
+ "train_runtime": 13251.4495,
1167
+ "train_samples_per_second": 2.415,
1168
+ "train_steps_per_second": 0.038
1169
+ }
1170
+ ],
1171
+ "logging_steps": 5,
1172
+ "max_steps": 500,
1173
+ "num_input_tokens_seen": 0,
1174
+ "num_train_epochs": 8,
1175
+ "save_steps": 10,
1176
+ "total_flos": 7.8022170722304e+17,
1177
+ "train_batch_size": 64,
1178
+ "trial_name": null,
1179
+ "trial_params": null
1180
+ }