Masaki Hattori commited on
Commit
992bfd6
1 Parent(s): 8691953

Model save

Browse files
Files changed (2) hide show
  1. README.md +73 -0
  2. trainer_state.json +608 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: rinna/japanese-hubert-base
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - wer
8
+ model-index:
9
+ - name: hubert-rinnna-jp-jdrtsp-fw07sp-12
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # hubert-rinnna-jp-jdrtsp-fw07sp-12
17
+
18
+ This model is a fine-tuned version of [rinna/japanese-hubert-base](https://huggingface.co/rinna/japanese-hubert-base) on the None dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 1.1989
21
+ - Wer: 0.6801
22
+ - Cer: 0.5794
23
+
24
+ ## Model description
25
+
26
+ More information needed
27
+
28
+ ## Intended uses & limitations
29
+
30
+ More information needed
31
+
32
+ ## Training and evaluation data
33
+
34
+ More information needed
35
+
36
+ ## Training procedure
37
+
38
+ ### Training hyperparameters
39
+
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 1e-05
42
+ - train_batch_size: 32
43
+ - eval_batch_size: 16
44
+ - seed: 42
45
+ - gradient_accumulation_steps: 2
46
+ - total_train_batch_size: 64
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: linear
49
+ - lr_scheduler_warmup_steps: 1000
50
+ - num_epochs: 10
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
55
+ |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|
56
+ | 5.0318 | 1.0 | 404 | 4.2999 | 0.9798 | 0.9889 |
57
+ | 3.5113 | 2.0 | 808 | 3.3289 | 0.9798 | 0.9889 |
58
+ | 2.7536 | 3.0 | 1212 | 2.7007 | 0.9798 | 0.9889 |
59
+ | 2.4826 | 4.0 | 1616 | 2.3732 | 0.9798 | 0.9889 |
60
+ | 2.0642 | 5.0 | 2020 | 1.9165 | 0.9798 | 0.9888 |
61
+ | 1.834 | 6.0 | 2424 | 1.6739 | 0.9504 | 0.9464 |
62
+ | 1.6869 | 7.0 | 2828 | 1.4651 | 0.8239 | 0.7865 |
63
+ | 1.5734 | 8.0 | 3232 | 1.3267 | 0.7440 | 0.6939 |
64
+ | 1.5052 | 9.0 | 3636 | 1.2331 | 0.7045 | 0.6231 |
65
+ | 1.4573 | 10.0 | 4040 | 1.1989 | 0.6801 | 0.5794 |
66
+
67
+
68
+ ### Framework versions
69
+
70
+ - Transformers 4.34.0.dev0
71
+ - Pytorch 2.0.1+cu118
72
+ - Datasets 2.14.5
73
+ - Tokenizers 0.13.3
trainer_state.json ADDED
@@ -0,0 +1,608 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 4040,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.12,
13
+ "learning_rate": 5.000000000000001e-07,
14
+ "loss": 10.9669,
15
+ "step": 50
16
+ },
17
+ {
18
+ "epoch": 0.25,
19
+ "learning_rate": 1.0000000000000002e-06,
20
+ "loss": 10.7574,
21
+ "step": 100
22
+ },
23
+ {
24
+ "epoch": 0.37,
25
+ "learning_rate": 1.5e-06,
26
+ "loss": 10.4847,
27
+ "step": 150
28
+ },
29
+ {
30
+ "epoch": 0.5,
31
+ "learning_rate": 2.0000000000000003e-06,
32
+ "loss": 9.3795,
33
+ "step": 200
34
+ },
35
+ {
36
+ "epoch": 0.62,
37
+ "learning_rate": 2.5e-06,
38
+ "loss": 7.4952,
39
+ "step": 250
40
+ },
41
+ {
42
+ "epoch": 0.74,
43
+ "learning_rate": 3e-06,
44
+ "loss": 6.3533,
45
+ "step": 300
46
+ },
47
+ {
48
+ "epoch": 0.87,
49
+ "learning_rate": 3.5e-06,
50
+ "loss": 5.356,
51
+ "step": 350
52
+ },
53
+ {
54
+ "epoch": 0.99,
55
+ "learning_rate": 4.000000000000001e-06,
56
+ "loss": 5.0318,
57
+ "step": 400
58
+ },
59
+ {
60
+ "epoch": 1.0,
61
+ "eval_cer": 0.9888512643814729,
62
+ "eval_loss": 4.29986047744751,
63
+ "eval_runtime": 38.3788,
64
+ "eval_samples_per_second": 168.426,
65
+ "eval_steps_per_second": 10.527,
66
+ "eval_wer": 0.9797794777861581,
67
+ "step": 404
68
+ },
69
+ {
70
+ "epoch": 1.11,
71
+ "learning_rate": 4.5e-06,
72
+ "loss": 4.7526,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 1.24,
77
+ "learning_rate": 5e-06,
78
+ "loss": 4.5027,
79
+ "step": 500
80
+ },
81
+ {
82
+ "epoch": 1.36,
83
+ "learning_rate": 5.500000000000001e-06,
84
+ "loss": 4.3223,
85
+ "step": 550
86
+ },
87
+ {
88
+ "epoch": 1.49,
89
+ "learning_rate": 6e-06,
90
+ "loss": 4.0849,
91
+ "step": 600
92
+ },
93
+ {
94
+ "epoch": 1.61,
95
+ "learning_rate": 6.5000000000000004e-06,
96
+ "loss": 3.9545,
97
+ "step": 650
98
+ },
99
+ {
100
+ "epoch": 1.73,
101
+ "learning_rate": 7e-06,
102
+ "loss": 3.7431,
103
+ "step": 700
104
+ },
105
+ {
106
+ "epoch": 1.86,
107
+ "learning_rate": 7.500000000000001e-06,
108
+ "loss": 3.6413,
109
+ "step": 750
110
+ },
111
+ {
112
+ "epoch": 1.98,
113
+ "learning_rate": 8.000000000000001e-06,
114
+ "loss": 3.5113,
115
+ "step": 800
116
+ },
117
+ {
118
+ "epoch": 2.0,
119
+ "eval_cer": 0.9888512643814729,
120
+ "eval_loss": 3.3288848400115967,
121
+ "eval_runtime": 39.0341,
122
+ "eval_samples_per_second": 165.599,
123
+ "eval_steps_per_second": 10.35,
124
+ "eval_wer": 0.9797794777861581,
125
+ "step": 808
126
+ },
127
+ {
128
+ "epoch": 2.1,
129
+ "learning_rate": 8.5e-06,
130
+ "loss": 3.3907,
131
+ "step": 850
132
+ },
133
+ {
134
+ "epoch": 2.23,
135
+ "learning_rate": 9e-06,
136
+ "loss": 3.2836,
137
+ "step": 900
138
+ },
139
+ {
140
+ "epoch": 2.35,
141
+ "learning_rate": 9.5e-06,
142
+ "loss": 3.1635,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 2.48,
147
+ "learning_rate": 1e-05,
148
+ "loss": 3.0437,
149
+ "step": 1000
150
+ },
151
+ {
152
+ "epoch": 2.6,
153
+ "learning_rate": 9.835526315789474e-06,
154
+ "loss": 2.9567,
155
+ "step": 1050
156
+ },
157
+ {
158
+ "epoch": 2.72,
159
+ "learning_rate": 9.671052631578948e-06,
160
+ "loss": 2.8632,
161
+ "step": 1100
162
+ },
163
+ {
164
+ "epoch": 2.85,
165
+ "learning_rate": 9.506578947368423e-06,
166
+ "loss": 2.8078,
167
+ "step": 1150
168
+ },
169
+ {
170
+ "epoch": 2.97,
171
+ "learning_rate": 9.342105263157895e-06,
172
+ "loss": 2.7536,
173
+ "step": 1200
174
+ },
175
+ {
176
+ "epoch": 3.0,
177
+ "eval_cer": 0.9888512643814729,
178
+ "eval_loss": 2.700655460357666,
179
+ "eval_runtime": 38.3318,
180
+ "eval_samples_per_second": 168.633,
181
+ "eval_steps_per_second": 10.54,
182
+ "eval_wer": 0.9797794777861581,
183
+ "step": 1212
184
+ },
185
+ {
186
+ "epoch": 3.09,
187
+ "learning_rate": 9.17763157894737e-06,
188
+ "loss": 2.7127,
189
+ "step": 1250
190
+ },
191
+ {
192
+ "epoch": 3.22,
193
+ "learning_rate": 9.013157894736843e-06,
194
+ "loss": 2.6749,
195
+ "step": 1300
196
+ },
197
+ {
198
+ "epoch": 3.34,
199
+ "learning_rate": 8.848684210526316e-06,
200
+ "loss": 2.6572,
201
+ "step": 1350
202
+ },
203
+ {
204
+ "epoch": 3.47,
205
+ "learning_rate": 8.68421052631579e-06,
206
+ "loss": 2.6236,
207
+ "step": 1400
208
+ },
209
+ {
210
+ "epoch": 3.59,
211
+ "learning_rate": 8.519736842105265e-06,
212
+ "loss": 2.6007,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 3.71,
217
+ "learning_rate": 8.355263157894737e-06,
218
+ "loss": 2.5653,
219
+ "step": 1500
220
+ },
221
+ {
222
+ "epoch": 3.84,
223
+ "learning_rate": 8.19078947368421e-06,
224
+ "loss": 2.5333,
225
+ "step": 1550
226
+ },
227
+ {
228
+ "epoch": 3.96,
229
+ "learning_rate": 8.026315789473685e-06,
230
+ "loss": 2.4826,
231
+ "step": 1600
232
+ },
233
+ {
234
+ "epoch": 4.0,
235
+ "eval_cer": 0.9888512643814729,
236
+ "eval_loss": 2.373181104660034,
237
+ "eval_runtime": 37.8617,
238
+ "eval_samples_per_second": 170.727,
239
+ "eval_steps_per_second": 10.67,
240
+ "eval_wer": 0.9797794777861581,
241
+ "step": 1616
242
+ },
243
+ {
244
+ "epoch": 4.08,
245
+ "learning_rate": 7.86184210526316e-06,
246
+ "loss": 2.4176,
247
+ "step": 1650
248
+ },
249
+ {
250
+ "epoch": 4.21,
251
+ "learning_rate": 7.697368421052632e-06,
252
+ "loss": 2.371,
253
+ "step": 1700
254
+ },
255
+ {
256
+ "epoch": 4.33,
257
+ "learning_rate": 7.532894736842106e-06,
258
+ "loss": 2.302,
259
+ "step": 1750
260
+ },
261
+ {
262
+ "epoch": 4.46,
263
+ "learning_rate": 7.368421052631579e-06,
264
+ "loss": 2.2307,
265
+ "step": 1800
266
+ },
267
+ {
268
+ "epoch": 4.58,
269
+ "learning_rate": 7.203947368421054e-06,
270
+ "loss": 2.1823,
271
+ "step": 1850
272
+ },
273
+ {
274
+ "epoch": 4.7,
275
+ "learning_rate": 7.0394736842105274e-06,
276
+ "loss": 2.157,
277
+ "step": 1900
278
+ },
279
+ {
280
+ "epoch": 4.83,
281
+ "learning_rate": 6.875e-06,
282
+ "loss": 2.1087,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 4.95,
287
+ "learning_rate": 6.710526315789474e-06,
288
+ "loss": 2.0642,
289
+ "step": 2000
290
+ },
291
+ {
292
+ "epoch": 5.0,
293
+ "eval_cer": 0.9888314620091487,
294
+ "eval_loss": 1.9164613485336304,
295
+ "eval_runtime": 38.0086,
296
+ "eval_samples_per_second": 170.067,
297
+ "eval_steps_per_second": 10.629,
298
+ "eval_wer": 0.9797794777861581,
299
+ "step": 2020
300
+ },
301
+ {
302
+ "epoch": 5.07,
303
+ "learning_rate": 6.5460526315789476e-06,
304
+ "loss": 2.0318,
305
+ "step": 2050
306
+ },
307
+ {
308
+ "epoch": 5.2,
309
+ "learning_rate": 6.381578947368422e-06,
310
+ "loss": 2.0039,
311
+ "step": 2100
312
+ },
313
+ {
314
+ "epoch": 5.32,
315
+ "learning_rate": 6.217105263157896e-06,
316
+ "loss": 1.97,
317
+ "step": 2150
318
+ },
319
+ {
320
+ "epoch": 5.45,
321
+ "learning_rate": 6.0526315789473685e-06,
322
+ "loss": 1.9502,
323
+ "step": 2200
324
+ },
325
+ {
326
+ "epoch": 5.57,
327
+ "learning_rate": 5.888157894736842e-06,
328
+ "loss": 1.9235,
329
+ "step": 2250
330
+ },
331
+ {
332
+ "epoch": 5.69,
333
+ "learning_rate": 5.723684210526316e-06,
334
+ "loss": 1.8816,
335
+ "step": 2300
336
+ },
337
+ {
338
+ "epoch": 5.82,
339
+ "learning_rate": 5.55921052631579e-06,
340
+ "loss": 1.8675,
341
+ "step": 2350
342
+ },
343
+ {
344
+ "epoch": 5.94,
345
+ "learning_rate": 5.394736842105264e-06,
346
+ "loss": 1.834,
347
+ "step": 2400
348
+ },
349
+ {
350
+ "epoch": 6.0,
351
+ "eval_cer": 0.9463949781183786,
352
+ "eval_loss": 1.6738649606704712,
353
+ "eval_runtime": 38.6983,
354
+ "eval_samples_per_second": 167.036,
355
+ "eval_steps_per_second": 10.44,
356
+ "eval_wer": 0.9504004597205761,
357
+ "step": 2424
358
+ },
359
+ {
360
+ "epoch": 6.06,
361
+ "learning_rate": 5.230263157894737e-06,
362
+ "loss": 1.8211,
363
+ "step": 2450
364
+ },
365
+ {
366
+ "epoch": 6.19,
367
+ "learning_rate": 5.0657894736842104e-06,
368
+ "loss": 1.7972,
369
+ "step": 2500
370
+ },
371
+ {
372
+ "epoch": 6.31,
373
+ "learning_rate": 4.901315789473685e-06,
374
+ "loss": 1.7812,
375
+ "step": 2550
376
+ },
377
+ {
378
+ "epoch": 6.44,
379
+ "learning_rate": 4.736842105263158e-06,
380
+ "loss": 1.7627,
381
+ "step": 2600
382
+ },
383
+ {
384
+ "epoch": 6.56,
385
+ "learning_rate": 4.572368421052632e-06,
386
+ "loss": 1.7391,
387
+ "step": 2650
388
+ },
389
+ {
390
+ "epoch": 6.68,
391
+ "learning_rate": 4.407894736842105e-06,
392
+ "loss": 1.7241,
393
+ "step": 2700
394
+ },
395
+ {
396
+ "epoch": 6.81,
397
+ "learning_rate": 4.2434210526315796e-06,
398
+ "loss": 1.7089,
399
+ "step": 2750
400
+ },
401
+ {
402
+ "epoch": 6.93,
403
+ "learning_rate": 4.078947368421053e-06,
404
+ "loss": 1.6869,
405
+ "step": 2800
406
+ },
407
+ {
408
+ "epoch": 7.0,
409
+ "eval_cer": 0.7864710192281035,
410
+ "eval_loss": 1.4651445150375366,
411
+ "eval_runtime": 38.8273,
412
+ "eval_samples_per_second": 166.481,
413
+ "eval_steps_per_second": 10.405,
414
+ "eval_wer": 0.8239413856265488,
415
+ "step": 2828
416
+ },
417
+ {
418
+ "epoch": 7.05,
419
+ "learning_rate": 3.914473684210527e-06,
420
+ "loss": 1.6732,
421
+ "step": 2850
422
+ },
423
+ {
424
+ "epoch": 7.18,
425
+ "learning_rate": 3.7500000000000005e-06,
426
+ "loss": 1.6575,
427
+ "step": 2900
428
+ },
429
+ {
430
+ "epoch": 7.3,
431
+ "learning_rate": 3.5855263157894737e-06,
432
+ "loss": 1.6262,
433
+ "step": 2950
434
+ },
435
+ {
436
+ "epoch": 7.43,
437
+ "learning_rate": 3.421052631578948e-06,
438
+ "loss": 1.635,
439
+ "step": 3000
440
+ },
441
+ {
442
+ "epoch": 7.55,
443
+ "learning_rate": 3.256578947368421e-06,
444
+ "loss": 1.6008,
445
+ "step": 3050
446
+ },
447
+ {
448
+ "epoch": 7.67,
449
+ "learning_rate": 3.092105263157895e-06,
450
+ "loss": 1.5957,
451
+ "step": 3100
452
+ },
453
+ {
454
+ "epoch": 7.8,
455
+ "learning_rate": 2.927631578947369e-06,
456
+ "loss": 1.5809,
457
+ "step": 3150
458
+ },
459
+ {
460
+ "epoch": 7.92,
461
+ "learning_rate": 2.7631578947368424e-06,
462
+ "loss": 1.5734,
463
+ "step": 3200
464
+ },
465
+ {
466
+ "epoch": 8.0,
467
+ "eval_cer": 0.693914730984772,
468
+ "eval_loss": 1.3266735076904297,
469
+ "eval_runtime": 37.9957,
470
+ "eval_samples_per_second": 170.125,
471
+ "eval_steps_per_second": 10.633,
472
+ "eval_wer": 0.7439571885213518,
473
+ "step": 3232
474
+ },
475
+ {
476
+ "epoch": 8.04,
477
+ "learning_rate": 2.598684210526316e-06,
478
+ "loss": 1.5604,
479
+ "step": 3250
480
+ },
481
+ {
482
+ "epoch": 8.17,
483
+ "learning_rate": 2.4342105263157898e-06,
484
+ "loss": 1.5457,
485
+ "step": 3300
486
+ },
487
+ {
488
+ "epoch": 8.29,
489
+ "learning_rate": 2.2697368421052634e-06,
490
+ "loss": 1.5277,
491
+ "step": 3350
492
+ },
493
+ {
494
+ "epoch": 8.42,
495
+ "learning_rate": 2.105263157894737e-06,
496
+ "loss": 1.53,
497
+ "step": 3400
498
+ },
499
+ {
500
+ "epoch": 8.54,
501
+ "learning_rate": 1.9407894736842107e-06,
502
+ "loss": 1.5365,
503
+ "step": 3450
504
+ },
505
+ {
506
+ "epoch": 8.66,
507
+ "learning_rate": 1.7763157894736844e-06,
508
+ "loss": 1.5237,
509
+ "step": 3500
510
+ },
511
+ {
512
+ "epoch": 8.79,
513
+ "learning_rate": 1.611842105263158e-06,
514
+ "loss": 1.5106,
515
+ "step": 3550
516
+ },
517
+ {
518
+ "epoch": 8.91,
519
+ "learning_rate": 1.4473684210526317e-06,
520
+ "loss": 1.5052,
521
+ "step": 3600
522
+ },
523
+ {
524
+ "epoch": 9.0,
525
+ "eval_cer": 0.6230816451810927,
526
+ "eval_loss": 1.233142614364624,
527
+ "eval_runtime": 39.1396,
528
+ "eval_samples_per_second": 165.152,
529
+ "eval_steps_per_second": 10.322,
530
+ "eval_wer": 0.7045217828538591,
531
+ "step": 3636
532
+ },
533
+ {
534
+ "epoch": 9.03,
535
+ "learning_rate": 1.2828947368421055e-06,
536
+ "loss": 1.4932,
537
+ "step": 3650
538
+ },
539
+ {
540
+ "epoch": 9.16,
541
+ "learning_rate": 1.118421052631579e-06,
542
+ "loss": 1.4764,
543
+ "step": 3700
544
+ },
545
+ {
546
+ "epoch": 9.28,
547
+ "learning_rate": 9.539473684210528e-07,
548
+ "loss": 1.4474,
549
+ "step": 3750
550
+ },
551
+ {
552
+ "epoch": 9.41,
553
+ "learning_rate": 7.894736842105263e-07,
554
+ "loss": 1.4755,
555
+ "step": 3800
556
+ },
557
+ {
558
+ "epoch": 9.53,
559
+ "learning_rate": 6.25e-07,
560
+ "loss": 1.481,
561
+ "step": 3850
562
+ },
563
+ {
564
+ "epoch": 9.65,
565
+ "learning_rate": 4.605263157894737e-07,
566
+ "loss": 1.4635,
567
+ "step": 3900
568
+ },
569
+ {
570
+ "epoch": 9.78,
571
+ "learning_rate": 2.9605263157894736e-07,
572
+ "loss": 1.4576,
573
+ "step": 3950
574
+ },
575
+ {
576
+ "epoch": 9.9,
577
+ "learning_rate": 1.3157894736842107e-07,
578
+ "loss": 1.4573,
579
+ "step": 4000
580
+ },
581
+ {
582
+ "epoch": 10.0,
583
+ "eval_cer": 0.5793976118338977,
584
+ "eval_loss": 1.198919653892517,
585
+ "eval_runtime": 38.6365,
586
+ "eval_samples_per_second": 167.303,
587
+ "eval_steps_per_second": 10.456,
588
+ "eval_wer": 0.680063211579212,
589
+ "step": 4040
590
+ },
591
+ {
592
+ "epoch": 10.0,
593
+ "step": 4040,
594
+ "total_flos": 2.1546455399489285e+18,
595
+ "train_loss": 2.8324050827781755,
596
+ "train_runtime": 8680.4972,
597
+ "train_samples_per_second": 29.786,
598
+ "train_steps_per_second": 0.465
599
+ }
600
+ ],
601
+ "logging_steps": 50,
602
+ "max_steps": 4040,
603
+ "num_train_epochs": 10,
604
+ "save_steps": 500,
605
+ "total_flos": 2.1546455399489285e+18,
606
+ "trial_name": null,
607
+ "trial_params": null
608
+ }