ccore commited on
Commit
1a2a470
1 Parent(s): 1d92770

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -12,37 +12,47 @@ model-index:
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
 
15
 
16
- # Model Card: LLama 2 - Version 7b (Embedding + Output + 1 Hidden Layer)
 
 
 
17
 
18
- ## Overview
19
 
20
- - **Link to Training Progress:** [WandB Training Progress](https://wandb.ai/inteligenciaartificialcursos/huggingface/runs/pv74nzw8?workspace=user-inteligenciaartificialcursos)
21
 
22
- - **Model Name:** LLama 2 - Version 7b
23
 
24
- - **Total Parameters:** 446 million
25
 
26
- ## Training Data
27
 
28
- The model has been trained with the following sequence of datasets:
29
 
30
- 1. **GPT-2 Data (DONE):** The initial training phase involves GPT-2 data and is currently in the finalization stage.
31
 
32
- 2. **Wikipedia QA in Markdown (In Progress):** The model's training will continue with Wikipedia question-answering data in Markdown format.
33
 
34
- 3. **QA with Rhetoric (Future Stages):** The model will further be fine-tuned with question-answering data generated from various LLama models, incorporating rhetorical elements.
 
 
 
 
 
 
 
 
 
35
 
36
- ## Model Description
37
 
38
- The LLama 2 - Version 7b model is a powerful language model with a total of 446 million parameters. It utilizes embeddings, an output layer, and one hidden layer to perform a wide range of natural language processing tasks. The training is conducted in multiple stages, each focused on different datasets and objectives.
39
 
40
- ## Disclaimer
41
 
42
- This model card provides an overview of the LLama 2 - Version 7b model, its training data, and intended use cases. Keep in mind that the model's performance may vary depending on the specific task or dataset. Users are encouraged to evaluate the model's suitability for their applications and exercise caution when using it in real-world scenarios.
43
 
44
- For any further inquiries or issues related to this model, please contact the model developers through the provided training progress link.
45
-
46
- ---
47
-
48
- Feel free to customize this Model Card further if you have additional details or specific use cases you'd like to highlight.
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # core2
16
 
17
+ This model is a fine-tuned version of [./core2](https://huggingface.co/./core2) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 2.7608
20
+ - Accuracy: 0.4077
21
 
22
+ ## Model description
23
 
24
+ More information needed
25
 
26
+ ## Intended uses & limitations
27
 
28
+ More information needed
29
 
30
+ ## Training and evaluation data
31
 
32
+ More information needed
33
 
34
+ ## Training procedure
35
 
36
+ ### Training hyperparameters
37
 
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 0.0001
40
+ - train_batch_size: 1
41
+ - eval_batch_size: 8
42
+ - seed: 42
43
+ - gradient_accumulation_steps: 32
44
+ - total_train_batch_size: 32
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: linear
47
+ - num_epochs: 1.0
48
 
49
+ ### Training results
50
 
 
51
 
 
52
 
53
+ ### Framework versions
54
 
55
+ - Transformers 4.34.0.dev0
56
+ - Pytorch 2.0.1+cu117
57
+ - Datasets 2.14.5
58
+ - Tokenizers 0.13.3
 
all_results.json CHANGED
@@ -1,15 +1,15 @@
1
  {
2
  "epoch": 1.0,
3
- "eval_accuracy": 0.4413527624330325,
4
- "eval_loss": 2.4912242889404297,
5
- "eval_runtime": 1.7817,
6
  "eval_samples": 129,
7
- "eval_samples_per_second": 72.402,
8
- "eval_steps_per_second": 9.541,
9
- "perplexity": 12.076051650319789,
10
- "train_loss": 2.5232514588412474,
11
- "train_runtime": 5052.6511,
12
- "train_samples": 175957,
13
- "train_samples_per_second": 34.825,
14
- "train_steps_per_second": 4.353
15
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "eval_accuracy": 0.4077155652549501,
4
+ "eval_loss": 2.7608211040496826,
5
+ "eval_runtime": 1.7213,
6
  "eval_samples": 129,
7
+ "eval_samples_per_second": 74.942,
8
+ "eval_steps_per_second": 9.876,
9
+ "perplexity": 15.81282159096841,
10
+ "train_loss": 2.7717256223521947,
11
+ "train_runtime": 3102.8464,
12
+ "train_samples": 119233,
13
+ "train_samples_per_second": 38.427,
14
+ "train_steps_per_second": 1.201
15
  }
eval_results.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
  "epoch": 1.0,
3
- "eval_accuracy": 0.4413527624330325,
4
- "eval_loss": 2.4912242889404297,
5
- "eval_runtime": 1.7817,
6
  "eval_samples": 129,
7
- "eval_samples_per_second": 72.402,
8
- "eval_steps_per_second": 9.541,
9
- "perplexity": 12.076051650319789
10
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "eval_accuracy": 0.4077155652549501,
4
+ "eval_loss": 2.7608211040496826,
5
+ "eval_runtime": 1.7213,
6
  "eval_samples": 129,
7
+ "eval_samples_per_second": 74.942,
8
+ "eval_steps_per_second": 9.876,
9
+ "perplexity": 15.81282159096841
10
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb40c9bdf099a92651835537db6b802001ee4136b8b3ec0472fca0082aba72d7
3
  size 929067029
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:451f74d625df52f59037a15db1a90de07168091802624afed9235768562eba88
3
  size 929067029
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 2.5232514588412474,
4
- "train_runtime": 5052.6511,
5
- "train_samples": 175957,
6
- "train_samples_per_second": 34.825,
7
- "train_steps_per_second": 4.353
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 2.7717256223521947,
4
+ "train_runtime": 3102.8464,
5
+ "train_samples": 119233,
6
+ "train_samples_per_second": 38.427,
7
+ "train_steps_per_second": 1.201
8
  }
trainer_state.json CHANGED
@@ -1,2662 +1,472 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 0.9999715839665373,
5
  "eval_steps": 500,
6
- "global_step": 21994,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.0,
13
- "learning_rate": 9.977266527234701e-05,
14
- "loss": 2.6296,
15
  "step": 50
16
  },
17
  {
18
- "epoch": 0.0,
19
- "learning_rate": 9.954533054469402e-05,
20
- "loss": 2.6156,
21
  "step": 100
22
  },
23
  {
24
- "epoch": 0.01,
25
- "learning_rate": 9.931799581704102e-05,
26
- "loss": 2.6035,
27
  "step": 150
28
  },
29
  {
30
- "epoch": 0.01,
31
- "learning_rate": 9.909066108938801e-05,
32
- "loss": 2.5399,
33
  "step": 200
34
  },
35
  {
36
- "epoch": 0.01,
37
- "learning_rate": 9.886332636173502e-05,
38
- "loss": 2.5857,
39
  "step": 250
40
  },
41
  {
42
- "epoch": 0.01,
43
- "learning_rate": 9.863599163408202e-05,
44
- "loss": 2.6078,
45
  "step": 300
46
  },
47
  {
48
- "epoch": 0.02,
49
- "learning_rate": 9.840865690642903e-05,
50
- "loss": 2.5931,
51
  "step": 350
52
  },
53
  {
54
- "epoch": 0.02,
55
- "learning_rate": 9.818132217877604e-05,
56
- "loss": 2.5919,
57
  "step": 400
58
  },
59
  {
60
- "epoch": 0.02,
61
- "learning_rate": 9.795398745112304e-05,
62
- "loss": 2.59,
63
  "step": 450
64
  },
65
  {
66
- "epoch": 0.02,
67
- "learning_rate": 9.772665272347005e-05,
68
- "loss": 2.605,
69
  "step": 500
70
  },
71
  {
72
- "epoch": 0.03,
73
- "learning_rate": 9.749931799581704e-05,
74
- "loss": 2.6026,
75
  "step": 550
76
  },
77
  {
78
- "epoch": 0.03,
79
- "learning_rate": 9.727198326816404e-05,
80
- "loss": 2.5839,
81
  "step": 600
82
  },
83
  {
84
- "epoch": 0.03,
85
- "learning_rate": 9.704464854051105e-05,
86
- "loss": 2.5862,
87
  "step": 650
88
  },
89
  {
90
- "epoch": 0.03,
91
- "learning_rate": 9.681731381285806e-05,
92
- "loss": 2.609,
93
  "step": 700
94
  },
95
  {
96
- "epoch": 0.03,
97
- "learning_rate": 9.658997908520506e-05,
98
- "loss": 2.5759,
99
  "step": 750
100
  },
101
  {
102
- "epoch": 0.04,
103
- "learning_rate": 9.636264435755207e-05,
104
- "loss": 2.6046,
105
  "step": 800
106
  },
107
  {
108
- "epoch": 0.04,
109
- "learning_rate": 9.613530962989907e-05,
110
- "loss": 2.5811,
111
  "step": 850
112
  },
113
  {
114
- "epoch": 0.04,
115
- "learning_rate": 9.590797490224606e-05,
116
- "loss": 2.5797,
117
  "step": 900
118
  },
119
  {
120
- "epoch": 0.04,
121
- "learning_rate": 9.568064017459307e-05,
122
- "loss": 2.5867,
123
  "step": 950
124
  },
125
  {
126
- "epoch": 0.05,
127
- "learning_rate": 9.545330544694008e-05,
128
- "loss": 2.5927,
129
  "step": 1000
130
  },
131
  {
132
- "epoch": 0.05,
133
- "learning_rate": 9.522597071928708e-05,
134
- "loss": 2.568,
135
  "step": 1050
136
  },
137
  {
138
- "epoch": 0.05,
139
- "learning_rate": 9.499863599163409e-05,
140
- "loss": 2.6024,
141
  "step": 1100
142
  },
143
  {
144
- "epoch": 0.05,
145
- "learning_rate": 9.477130126398109e-05,
146
- "loss": 2.5936,
147
  "step": 1150
148
  },
149
  {
150
- "epoch": 0.05,
151
- "learning_rate": 9.45439665363281e-05,
152
- "loss": 2.605,
153
  "step": 1200
154
  },
155
  {
156
- "epoch": 0.06,
157
- "learning_rate": 9.431663180867509e-05,
158
- "loss": 2.5775,
159
  "step": 1250
160
  },
161
  {
162
- "epoch": 0.06,
163
- "learning_rate": 9.40892970810221e-05,
164
- "loss": 2.5752,
165
  "step": 1300
166
  },
167
  {
168
- "epoch": 0.06,
169
- "learning_rate": 9.38619623533691e-05,
170
- "loss": 2.5679,
171
  "step": 1350
172
  },
173
  {
174
- "epoch": 0.06,
175
- "learning_rate": 9.36346276257161e-05,
176
- "loss": 2.5856,
177
  "step": 1400
178
  },
179
  {
180
- "epoch": 0.07,
181
- "learning_rate": 9.340729289806311e-05,
182
- "loss": 2.5787,
183
  "step": 1450
184
  },
185
  {
186
- "epoch": 0.07,
187
- "learning_rate": 9.317995817041012e-05,
188
- "loss": 2.5875,
189
  "step": 1500
190
  },
191
  {
192
- "epoch": 0.07,
193
- "learning_rate": 9.295262344275712e-05,
194
- "loss": 2.5631,
195
  "step": 1550
196
  },
197
  {
198
- "epoch": 0.07,
199
- "learning_rate": 9.272528871510412e-05,
200
- "loss": 2.583,
201
  "step": 1600
202
  },
203
  {
204
- "epoch": 0.08,
205
- "learning_rate": 9.249795398745112e-05,
206
- "loss": 2.5609,
207
  "step": 1650
208
  },
209
  {
210
- "epoch": 0.08,
211
- "learning_rate": 9.227061925979813e-05,
212
- "loss": 2.587,
213
  "step": 1700
214
  },
215
  {
216
- "epoch": 0.08,
217
- "learning_rate": 9.204328453214513e-05,
218
- "loss": 2.5555,
219
  "step": 1750
220
  },
221
  {
222
- "epoch": 0.08,
223
- "learning_rate": 9.181594980449214e-05,
224
- "loss": 2.5488,
225
  "step": 1800
226
  },
227
  {
228
- "epoch": 0.08,
229
- "learning_rate": 9.158861507683914e-05,
230
- "loss": 2.5554,
231
  "step": 1850
232
  },
233
  {
234
- "epoch": 0.09,
235
- "learning_rate": 9.136128034918615e-05,
236
- "loss": 2.5408,
237
  "step": 1900
238
  },
239
  {
240
- "epoch": 0.09,
241
- "learning_rate": 9.113394562153314e-05,
242
- "loss": 2.582,
243
  "step": 1950
244
  },
245
  {
246
- "epoch": 0.09,
247
- "learning_rate": 9.090661089388015e-05,
248
- "loss": 2.5533,
249
  "step": 2000
250
  },
251
  {
252
- "epoch": 0.09,
253
- "learning_rate": 9.067927616622715e-05,
254
- "loss": 2.5432,
255
  "step": 2050
256
  },
257
  {
258
- "epoch": 0.1,
259
- "learning_rate": 9.045194143857416e-05,
260
- "loss": 2.5867,
261
  "step": 2100
262
  },
263
  {
264
- "epoch": 0.1,
265
- "learning_rate": 9.022460671092116e-05,
266
- "loss": 2.5343,
267
  "step": 2150
268
  },
269
  {
270
- "epoch": 0.1,
271
- "learning_rate": 8.999727198326817e-05,
272
- "loss": 2.585,
273
  "step": 2200
274
  },
275
  {
276
- "epoch": 0.1,
277
- "learning_rate": 8.976993725561517e-05,
278
- "loss": 2.5679,
279
  "step": 2250
280
  },
281
  {
282
- "epoch": 0.1,
283
- "learning_rate": 8.954260252796217e-05,
284
- "loss": 2.5515,
285
  "step": 2300
286
  },
287
  {
288
- "epoch": 0.11,
289
- "learning_rate": 8.931526780030917e-05,
290
- "loss": 2.5713,
291
  "step": 2350
292
  },
293
  {
294
- "epoch": 0.11,
295
- "learning_rate": 8.908793307265618e-05,
296
- "loss": 2.5587,
297
  "step": 2400
298
  },
299
  {
300
- "epoch": 0.11,
301
- "learning_rate": 8.886059834500318e-05,
302
- "loss": 2.5774,
303
  "step": 2450
304
  },
305
  {
306
- "epoch": 0.11,
307
- "learning_rate": 8.863326361735019e-05,
308
- "loss": 2.551,
309
  "step": 2500
310
  },
311
  {
312
- "epoch": 0.12,
313
- "learning_rate": 8.84059288896972e-05,
314
- "loss": 2.5685,
315
  "step": 2550
316
  },
317
  {
318
- "epoch": 0.12,
319
- "learning_rate": 8.81785941620442e-05,
320
- "loss": 2.5707,
321
  "step": 2600
322
  },
323
  {
324
- "epoch": 0.12,
325
- "learning_rate": 8.795125943439119e-05,
326
- "loss": 2.568,
327
  "step": 2650
328
  },
329
  {
330
- "epoch": 0.12,
331
- "learning_rate": 8.77239247067382e-05,
332
- "loss": 2.5536,
333
  "step": 2700
334
  },
335
  {
336
- "epoch": 0.13,
337
- "learning_rate": 8.74965899790852e-05,
338
- "loss": 2.5406,
339
  "step": 2750
340
  },
341
  {
342
- "epoch": 0.13,
343
- "learning_rate": 8.726925525143221e-05,
344
- "loss": 2.5572,
345
  "step": 2800
346
  },
347
  {
348
- "epoch": 0.13,
349
- "learning_rate": 8.704192052377921e-05,
350
- "loss": 2.5749,
351
  "step": 2850
352
  },
353
  {
354
- "epoch": 0.13,
355
- "learning_rate": 8.681458579612622e-05,
356
- "loss": 2.5607,
357
  "step": 2900
358
  },
359
  {
360
- "epoch": 0.13,
361
- "learning_rate": 8.658725106847322e-05,
362
- "loss": 2.5612,
363
  "step": 2950
364
  },
365
  {
366
- "epoch": 0.14,
367
- "learning_rate": 8.635991634082023e-05,
368
- "loss": 2.5626,
369
  "step": 3000
370
  },
371
  {
372
- "epoch": 0.14,
373
- "learning_rate": 8.613258161316724e-05,
374
- "loss": 2.5677,
375
  "step": 3050
376
  },
377
  {
378
- "epoch": 0.14,
379
- "learning_rate": 8.590524688551423e-05,
380
- "loss": 2.5072,
381
  "step": 3100
382
  },
383
  {
384
- "epoch": 0.14,
385
- "learning_rate": 8.567791215786123e-05,
386
- "loss": 2.562,
387
  "step": 3150
388
  },
389
  {
390
- "epoch": 0.15,
391
- "learning_rate": 8.545057743020824e-05,
392
- "loss": 2.5786,
393
  "step": 3200
394
  },
395
  {
396
- "epoch": 0.15,
397
- "learning_rate": 8.522324270255524e-05,
398
- "loss": 2.5388,
399
  "step": 3250
400
  },
401
  {
402
- "epoch": 0.15,
403
- "learning_rate": 8.499590797490225e-05,
404
- "loss": 2.538,
405
  "step": 3300
406
  },
407
  {
408
- "epoch": 0.15,
409
- "learning_rate": 8.476857324724926e-05,
410
- "loss": 2.5448,
411
  "step": 3350
412
  },
413
  {
414
- "epoch": 0.15,
415
- "learning_rate": 8.454123851959626e-05,
416
- "loss": 2.5306,
417
  "step": 3400
418
  },
419
  {
420
- "epoch": 0.16,
421
- "learning_rate": 8.431390379194327e-05,
422
- "loss": 2.5647,
423
  "step": 3450
424
  },
425
  {
426
- "epoch": 0.16,
427
- "learning_rate": 8.408656906429027e-05,
428
- "loss": 2.5386,
429
  "step": 3500
430
  },
431
  {
432
- "epoch": 0.16,
433
- "learning_rate": 8.385923433663728e-05,
434
- "loss": 2.5376,
435
  "step": 3550
436
  },
437
  {
438
- "epoch": 0.16,
439
- "learning_rate": 8.363189960898427e-05,
440
- "loss": 2.535,
441
  "step": 3600
442
  },
443
  {
444
- "epoch": 0.17,
445
- "learning_rate": 8.340456488133128e-05,
446
- "loss": 2.5476,
447
  "step": 3650
448
  },
449
  {
450
- "epoch": 0.17,
451
- "learning_rate": 8.317723015367828e-05,
452
- "loss": 2.5462,
453
  "step": 3700
454
  },
455
- {
456
- "epoch": 0.17,
457
- "learning_rate": 8.294989542602529e-05,
458
- "loss": 2.5795,
459
- "step": 3750
460
- },
461
- {
462
- "epoch": 0.17,
463
- "learning_rate": 8.272256069837229e-05,
464
- "loss": 2.5425,
465
- "step": 3800
466
- },
467
- {
468
- "epoch": 0.18,
469
- "learning_rate": 8.24952259707193e-05,
470
- "loss": 2.5663,
471
- "step": 3850
472
- },
473
- {
474
- "epoch": 0.18,
475
- "learning_rate": 8.22678912430663e-05,
476
- "loss": 2.5376,
477
- "step": 3900
478
- },
479
- {
480
- "epoch": 0.18,
481
- "learning_rate": 8.204055651541331e-05,
482
- "loss": 2.5695,
483
- "step": 3950
484
- },
485
- {
486
- "epoch": 0.18,
487
- "learning_rate": 8.18132217877603e-05,
488
- "loss": 2.5112,
489
- "step": 4000
490
- },
491
- {
492
- "epoch": 0.18,
493
- "learning_rate": 8.15858870601073e-05,
494
- "loss": 2.5255,
495
- "step": 4050
496
- },
497
- {
498
- "epoch": 0.19,
499
- "learning_rate": 8.135855233245431e-05,
500
- "loss": 2.5448,
501
- "step": 4100
502
- },
503
- {
504
- "epoch": 0.19,
505
- "learning_rate": 8.113121760480132e-05,
506
- "loss": 2.5483,
507
- "step": 4150
508
- },
509
- {
510
- "epoch": 0.19,
511
- "learning_rate": 8.090388287714832e-05,
512
- "loss": 2.5319,
513
- "step": 4200
514
- },
515
- {
516
- "epoch": 0.19,
517
- "learning_rate": 8.067654814949533e-05,
518
- "loss": 2.5655,
519
- "step": 4250
520
- },
521
- {
522
- "epoch": 0.2,
523
- "learning_rate": 8.044921342184233e-05,
524
- "loss": 2.5399,
525
- "step": 4300
526
- },
527
- {
528
- "epoch": 0.2,
529
- "learning_rate": 8.022187869418933e-05,
530
- "loss": 2.5485,
531
- "step": 4350
532
- },
533
- {
534
- "epoch": 0.2,
535
- "learning_rate": 7.999454396653633e-05,
536
- "loss": 2.5196,
537
- "step": 4400
538
- },
539
- {
540
- "epoch": 0.2,
541
- "learning_rate": 7.976720923888334e-05,
542
- "loss": 2.5554,
543
- "step": 4450
544
- },
545
- {
546
- "epoch": 0.2,
547
- "learning_rate": 7.953987451123034e-05,
548
- "loss": 2.583,
549
- "step": 4500
550
- },
551
- {
552
- "epoch": 0.21,
553
- "learning_rate": 7.931253978357735e-05,
554
- "loss": 2.5643,
555
- "step": 4550
556
- },
557
- {
558
- "epoch": 0.21,
559
- "learning_rate": 7.908520505592435e-05,
560
- "loss": 2.5345,
561
- "step": 4600
562
- },
563
- {
564
- "epoch": 0.21,
565
- "learning_rate": 7.885787032827136e-05,
566
- "loss": 2.5393,
567
- "step": 4650
568
- },
569
- {
570
- "epoch": 0.21,
571
- "learning_rate": 7.863053560061835e-05,
572
- "loss": 2.5349,
573
- "step": 4700
574
- },
575
- {
576
- "epoch": 0.22,
577
- "learning_rate": 7.840320087296536e-05,
578
- "loss": 2.54,
579
- "step": 4750
580
- },
581
- {
582
- "epoch": 0.22,
583
- "learning_rate": 7.817586614531236e-05,
584
- "loss": 2.5526,
585
- "step": 4800
586
- },
587
- {
588
- "epoch": 0.22,
589
- "learning_rate": 7.794853141765937e-05,
590
- "loss": 2.5419,
591
- "step": 4850
592
- },
593
- {
594
- "epoch": 0.22,
595
- "learning_rate": 7.772119669000637e-05,
596
- "loss": 2.5122,
597
- "step": 4900
598
- },
599
- {
600
- "epoch": 0.23,
601
- "learning_rate": 7.749386196235338e-05,
602
- "loss": 2.5247,
603
- "step": 4950
604
- },
605
- {
606
- "epoch": 0.23,
607
- "learning_rate": 7.726652723470039e-05,
608
- "loss": 2.5516,
609
- "step": 5000
610
- },
611
- {
612
- "epoch": 0.23,
613
- "learning_rate": 7.703919250704738e-05,
614
- "loss": 2.5321,
615
- "step": 5050
616
- },
617
- {
618
- "epoch": 0.23,
619
- "learning_rate": 7.681185777939438e-05,
620
- "loss": 2.5453,
621
- "step": 5100
622
- },
623
- {
624
- "epoch": 0.23,
625
- "learning_rate": 7.658452305174139e-05,
626
- "loss": 2.5453,
627
- "step": 5150
628
- },
629
- {
630
- "epoch": 0.24,
631
- "learning_rate": 7.63571883240884e-05,
632
- "loss": 2.5522,
633
- "step": 5200
634
- },
635
- {
636
- "epoch": 0.24,
637
- "learning_rate": 7.61298535964354e-05,
638
- "loss": 2.5417,
639
- "step": 5250
640
- },
641
- {
642
- "epoch": 0.24,
643
- "learning_rate": 7.59025188687824e-05,
644
- "loss": 2.5241,
645
- "step": 5300
646
- },
647
- {
648
- "epoch": 0.24,
649
- "learning_rate": 7.567518414112941e-05,
650
- "loss": 2.5574,
651
- "step": 5350
652
- },
653
- {
654
- "epoch": 0.25,
655
- "learning_rate": 7.54478494134764e-05,
656
- "loss": 2.5127,
657
- "step": 5400
658
- },
659
- {
660
- "epoch": 0.25,
661
- "learning_rate": 7.522051468582341e-05,
662
- "loss": 2.5346,
663
- "step": 5450
664
- },
665
- {
666
- "epoch": 0.25,
667
- "learning_rate": 7.499317995817041e-05,
668
- "loss": 2.5164,
669
- "step": 5500
670
- },
671
- {
672
- "epoch": 0.25,
673
- "learning_rate": 7.476584523051742e-05,
674
- "loss": 2.5571,
675
- "step": 5550
676
- },
677
- {
678
- "epoch": 0.25,
679
- "learning_rate": 7.453851050286442e-05,
680
- "loss": 2.5455,
681
- "step": 5600
682
- },
683
- {
684
- "epoch": 0.26,
685
- "learning_rate": 7.431117577521143e-05,
686
- "loss": 2.544,
687
- "step": 5650
688
- },
689
- {
690
- "epoch": 0.26,
691
- "learning_rate": 7.408384104755844e-05,
692
- "loss": 2.5271,
693
- "step": 5700
694
- },
695
- {
696
- "epoch": 0.26,
697
- "learning_rate": 7.385650631990543e-05,
698
- "loss": 2.525,
699
- "step": 5750
700
- },
701
- {
702
- "epoch": 0.26,
703
- "learning_rate": 7.362917159225243e-05,
704
- "loss": 2.5278,
705
- "step": 5800
706
- },
707
- {
708
- "epoch": 0.27,
709
- "learning_rate": 7.340183686459944e-05,
710
- "loss": 2.5161,
711
- "step": 5850
712
- },
713
- {
714
- "epoch": 0.27,
715
- "learning_rate": 7.317450213694644e-05,
716
- "loss": 2.5296,
717
- "step": 5900
718
- },
719
- {
720
- "epoch": 0.27,
721
- "learning_rate": 7.294716740929345e-05,
722
- "loss": 2.5454,
723
- "step": 5950
724
- },
725
- {
726
- "epoch": 0.27,
727
- "learning_rate": 7.271983268164046e-05,
728
- "loss": 2.5319,
729
- "step": 6000
730
- },
731
- {
732
- "epoch": 0.28,
733
- "learning_rate": 7.249249795398746e-05,
734
- "loss": 2.5282,
735
- "step": 6050
736
- },
737
- {
738
- "epoch": 0.28,
739
- "learning_rate": 7.226516322633445e-05,
740
- "loss": 2.5359,
741
- "step": 6100
742
- },
743
- {
744
- "epoch": 0.28,
745
- "learning_rate": 7.203782849868146e-05,
746
- "loss": 2.494,
747
- "step": 6150
748
- },
749
- {
750
- "epoch": 0.28,
751
- "learning_rate": 7.181049377102846e-05,
752
- "loss": 2.5289,
753
- "step": 6200
754
- },
755
- {
756
- "epoch": 0.28,
757
- "learning_rate": 7.158315904337547e-05,
758
- "loss": 2.4985,
759
- "step": 6250
760
- },
761
- {
762
- "epoch": 0.29,
763
- "learning_rate": 7.135582431572248e-05,
764
- "loss": 2.5156,
765
- "step": 6300
766
- },
767
- {
768
- "epoch": 0.29,
769
- "learning_rate": 7.112848958806948e-05,
770
- "loss": 2.53,
771
- "step": 6350
772
- },
773
- {
774
- "epoch": 0.29,
775
- "learning_rate": 7.090115486041649e-05,
776
- "loss": 2.5157,
777
- "step": 6400
778
- },
779
- {
780
- "epoch": 0.29,
781
- "learning_rate": 7.067382013276348e-05,
782
- "loss": 2.5303,
783
- "step": 6450
784
- },
785
- {
786
- "epoch": 0.3,
787
- "learning_rate": 7.044648540511048e-05,
788
- "loss": 2.5286,
789
- "step": 6500
790
- },
791
- {
792
- "epoch": 0.3,
793
- "learning_rate": 7.021915067745749e-05,
794
- "loss": 2.5039,
795
- "step": 6550
796
- },
797
- {
798
- "epoch": 0.3,
799
- "learning_rate": 6.99918159498045e-05,
800
- "loss": 2.5161,
801
- "step": 6600
802
- },
803
- {
804
- "epoch": 0.3,
805
- "learning_rate": 6.97644812221515e-05,
806
- "loss": 2.5105,
807
- "step": 6650
808
- },
809
- {
810
- "epoch": 0.3,
811
- "learning_rate": 6.95371464944985e-05,
812
- "loss": 2.5151,
813
- "step": 6700
814
- },
815
- {
816
- "epoch": 0.31,
817
- "learning_rate": 6.930981176684551e-05,
818
- "loss": 2.5425,
819
- "step": 6750
820
- },
821
- {
822
- "epoch": 0.31,
823
- "learning_rate": 6.90824770391925e-05,
824
- "loss": 2.5357,
825
- "step": 6800
826
- },
827
- {
828
- "epoch": 0.31,
829
- "learning_rate": 6.885514231153951e-05,
830
- "loss": 2.4989,
831
- "step": 6850
832
- },
833
- {
834
- "epoch": 0.31,
835
- "learning_rate": 6.862780758388652e-05,
836
- "loss": 2.5413,
837
- "step": 6900
838
- },
839
- {
840
- "epoch": 0.32,
841
- "learning_rate": 6.840047285623352e-05,
842
- "loss": 2.4909,
843
- "step": 6950
844
- },
845
- {
846
- "epoch": 0.32,
847
- "learning_rate": 6.817313812858053e-05,
848
- "loss": 2.5177,
849
- "step": 7000
850
- },
851
- {
852
- "epoch": 0.32,
853
- "learning_rate": 6.794580340092753e-05,
854
- "loss": 2.5107,
855
- "step": 7050
856
- },
857
- {
858
- "epoch": 0.32,
859
- "learning_rate": 6.771846867327454e-05,
860
- "loss": 2.5343,
861
- "step": 7100
862
- },
863
- {
864
- "epoch": 0.33,
865
- "learning_rate": 6.749113394562153e-05,
866
- "loss": 2.5247,
867
- "step": 7150
868
- },
869
- {
870
- "epoch": 0.33,
871
- "learning_rate": 6.726379921796854e-05,
872
- "loss": 2.5202,
873
- "step": 7200
874
- },
875
- {
876
- "epoch": 0.33,
877
- "learning_rate": 6.703646449031554e-05,
878
- "loss": 2.5156,
879
- "step": 7250
880
- },
881
- {
882
- "epoch": 0.33,
883
- "learning_rate": 6.680912976266255e-05,
884
- "loss": 2.5431,
885
- "step": 7300
886
- },
887
- {
888
- "epoch": 0.33,
889
- "learning_rate": 6.658179503500955e-05,
890
- "loss": 2.5221,
891
- "step": 7350
892
- },
893
- {
894
- "epoch": 0.34,
895
- "learning_rate": 6.635446030735656e-05,
896
- "loss": 2.516,
897
- "step": 7400
898
- },
899
- {
900
- "epoch": 0.34,
901
- "learning_rate": 6.612712557970356e-05,
902
- "loss": 2.5297,
903
- "step": 7450
904
- },
905
- {
906
- "epoch": 0.34,
907
- "learning_rate": 6.589979085205056e-05,
908
- "loss": 2.5052,
909
- "step": 7500
910
- },
911
- {
912
- "epoch": 0.34,
913
- "learning_rate": 6.567245612439756e-05,
914
- "loss": 2.4981,
915
- "step": 7550
916
- },
917
- {
918
- "epoch": 0.35,
919
- "learning_rate": 6.544512139674457e-05,
920
- "loss": 2.5292,
921
- "step": 7600
922
- },
923
- {
924
- "epoch": 0.35,
925
- "learning_rate": 6.521778666909157e-05,
926
- "loss": 2.4853,
927
- "step": 7650
928
- },
929
- {
930
- "epoch": 0.35,
931
- "learning_rate": 6.499045194143858e-05,
932
- "loss": 2.5181,
933
- "step": 7700
934
- },
935
- {
936
- "epoch": 0.35,
937
- "learning_rate": 6.476311721378558e-05,
938
- "loss": 2.5599,
939
- "step": 7750
940
- },
941
- {
942
- "epoch": 0.35,
943
- "learning_rate": 6.453578248613259e-05,
944
- "loss": 2.5093,
945
- "step": 7800
946
- },
947
- {
948
- "epoch": 0.36,
949
- "learning_rate": 6.430844775847958e-05,
950
- "loss": 2.5449,
951
- "step": 7850
952
- },
953
- {
954
- "epoch": 0.36,
955
- "learning_rate": 6.408111303082659e-05,
956
- "loss": 2.5013,
957
- "step": 7900
958
- },
959
- {
960
- "epoch": 0.36,
961
- "learning_rate": 6.385377830317359e-05,
962
- "loss": 2.5366,
963
- "step": 7950
964
- },
965
- {
966
- "epoch": 0.36,
967
- "learning_rate": 6.36264435755206e-05,
968
- "loss": 2.49,
969
- "step": 8000
970
- },
971
- {
972
- "epoch": 0.37,
973
- "learning_rate": 6.33991088478676e-05,
974
- "loss": 2.5496,
975
- "step": 8050
976
- },
977
- {
978
- "epoch": 0.37,
979
- "learning_rate": 6.317177412021461e-05,
980
- "loss": 2.5071,
981
- "step": 8100
982
- },
983
- {
984
- "epoch": 0.37,
985
- "learning_rate": 6.294443939256161e-05,
986
- "loss": 2.5374,
987
- "step": 8150
988
- },
989
- {
990
- "epoch": 0.37,
991
- "learning_rate": 6.27171046649086e-05,
992
- "loss": 2.5284,
993
- "step": 8200
994
- },
995
- {
996
- "epoch": 0.38,
997
- "learning_rate": 6.248976993725561e-05,
998
- "loss": 2.4792,
999
- "step": 8250
1000
- },
1001
- {
1002
- "epoch": 0.38,
1003
- "learning_rate": 6.226243520960262e-05,
1004
- "loss": 2.5061,
1005
- "step": 8300
1006
- },
1007
- {
1008
- "epoch": 0.38,
1009
- "learning_rate": 6.203510048194962e-05,
1010
- "loss": 2.5183,
1011
- "step": 8350
1012
- },
1013
- {
1014
- "epoch": 0.38,
1015
- "learning_rate": 6.180776575429663e-05,
1016
- "loss": 2.4886,
1017
- "step": 8400
1018
- },
1019
- {
1020
- "epoch": 0.38,
1021
- "learning_rate": 6.158043102664363e-05,
1022
- "loss": 2.5116,
1023
- "step": 8450
1024
- },
1025
- {
1026
- "epoch": 0.39,
1027
- "learning_rate": 6.135309629899064e-05,
1028
- "loss": 2.5364,
1029
- "step": 8500
1030
- },
1031
- {
1032
- "epoch": 0.39,
1033
- "learning_rate": 6.112576157133763e-05,
1034
- "loss": 2.5205,
1035
- "step": 8550
1036
- },
1037
- {
1038
- "epoch": 0.39,
1039
- "learning_rate": 6.0898426843684644e-05,
1040
- "loss": 2.5125,
1041
- "step": 8600
1042
- },
1043
- {
1044
- "epoch": 0.39,
1045
- "learning_rate": 6.067109211603165e-05,
1046
- "loss": 2.5089,
1047
- "step": 8650
1048
- },
1049
- {
1050
- "epoch": 0.4,
1051
- "learning_rate": 6.0443757388378655e-05,
1052
- "loss": 2.5088,
1053
- "step": 8700
1054
- },
1055
- {
1056
- "epoch": 0.4,
1057
- "learning_rate": 6.021642266072566e-05,
1058
- "loss": 2.5264,
1059
- "step": 8750
1060
- },
1061
- {
1062
- "epoch": 0.4,
1063
- "learning_rate": 5.9989087933072666e-05,
1064
- "loss": 2.5045,
1065
- "step": 8800
1066
- },
1067
- {
1068
- "epoch": 0.4,
1069
- "learning_rate": 5.976175320541967e-05,
1070
- "loss": 2.5085,
1071
- "step": 8850
1072
- },
1073
- {
1074
- "epoch": 0.4,
1075
- "learning_rate": 5.9534418477766663e-05,
1076
- "loss": 2.4801,
1077
- "step": 8900
1078
- },
1079
- {
1080
- "epoch": 0.41,
1081
- "learning_rate": 5.930708375011367e-05,
1082
- "loss": 2.5017,
1083
- "step": 8950
1084
- },
1085
- {
1086
- "epoch": 0.41,
1087
- "learning_rate": 5.9079749022460675e-05,
1088
- "loss": 2.5109,
1089
- "step": 9000
1090
- },
1091
- {
1092
- "epoch": 0.41,
1093
- "learning_rate": 5.885241429480768e-05,
1094
- "loss": 2.5052,
1095
- "step": 9050
1096
- },
1097
- {
1098
- "epoch": 0.41,
1099
- "learning_rate": 5.8625079567154686e-05,
1100
- "loss": 2.5139,
1101
- "step": 9100
1102
- },
1103
- {
1104
- "epoch": 0.42,
1105
- "learning_rate": 5.839774483950169e-05,
1106
- "loss": 2.4941,
1107
- "step": 9150
1108
- },
1109
- {
1110
- "epoch": 0.42,
1111
- "learning_rate": 5.817041011184868e-05,
1112
- "loss": 2.5137,
1113
- "step": 9200
1114
- },
1115
- {
1116
- "epoch": 0.42,
1117
- "learning_rate": 5.794307538419569e-05,
1118
- "loss": 2.5101,
1119
- "step": 9250
1120
- },
1121
- {
1122
- "epoch": 0.42,
1123
- "learning_rate": 5.7715740656542694e-05,
1124
- "loss": 2.5009,
1125
- "step": 9300
1126
- },
1127
- {
1128
- "epoch": 0.43,
1129
- "learning_rate": 5.74884059288897e-05,
1130
- "loss": 2.5395,
1131
- "step": 9350
1132
- },
1133
- {
1134
- "epoch": 0.43,
1135
- "learning_rate": 5.7261071201236706e-05,
1136
- "loss": 2.5108,
1137
- "step": 9400
1138
- },
1139
- {
1140
- "epoch": 0.43,
1141
- "learning_rate": 5.703373647358371e-05,
1142
- "loss": 2.5238,
1143
- "step": 9450
1144
- },
1145
- {
1146
- "epoch": 0.43,
1147
- "learning_rate": 5.680640174593072e-05,
1148
- "loss": 2.5037,
1149
- "step": 9500
1150
- },
1151
- {
1152
- "epoch": 0.43,
1153
- "learning_rate": 5.657906701827771e-05,
1154
- "loss": 2.5038,
1155
- "step": 9550
1156
- },
1157
- {
1158
- "epoch": 0.44,
1159
- "learning_rate": 5.6351732290624714e-05,
1160
- "loss": 2.5324,
1161
- "step": 9600
1162
- },
1163
- {
1164
- "epoch": 0.44,
1165
- "learning_rate": 5.612439756297172e-05,
1166
- "loss": 2.5054,
1167
- "step": 9650
1168
- },
1169
- {
1170
- "epoch": 0.44,
1171
- "learning_rate": 5.5897062835318725e-05,
1172
- "loss": 2.5119,
1173
- "step": 9700
1174
- },
1175
- {
1176
- "epoch": 0.44,
1177
- "learning_rate": 5.566972810766573e-05,
1178
- "loss": 2.5214,
1179
- "step": 9750
1180
- },
1181
- {
1182
- "epoch": 0.45,
1183
- "learning_rate": 5.5442393380012737e-05,
1184
- "loss": 2.5404,
1185
- "step": 9800
1186
- },
1187
- {
1188
- "epoch": 0.45,
1189
- "learning_rate": 5.521505865235974e-05,
1190
- "loss": 2.516,
1191
- "step": 9850
1192
- },
1193
- {
1194
- "epoch": 0.45,
1195
- "learning_rate": 5.4987723924706734e-05,
1196
- "loss": 2.5166,
1197
- "step": 9900
1198
- },
1199
- {
1200
- "epoch": 0.45,
1201
- "learning_rate": 5.476038919705374e-05,
1202
- "loss": 2.4983,
1203
- "step": 9950
1204
- },
1205
- {
1206
- "epoch": 0.45,
1207
- "learning_rate": 5.4533054469400745e-05,
1208
- "loss": 2.5101,
1209
- "step": 10000
1210
- },
1211
- {
1212
- "epoch": 0.46,
1213
- "learning_rate": 5.430571974174775e-05,
1214
- "loss": 2.4998,
1215
- "step": 10050
1216
- },
1217
- {
1218
- "epoch": 0.46,
1219
- "learning_rate": 5.4078385014094756e-05,
1220
- "loss": 2.5116,
1221
- "step": 10100
1222
- },
1223
- {
1224
- "epoch": 0.46,
1225
- "learning_rate": 5.385105028644176e-05,
1226
- "loss": 2.5136,
1227
- "step": 10150
1228
- },
1229
- {
1230
- "epoch": 0.46,
1231
- "learning_rate": 5.362371555878877e-05,
1232
- "loss": 2.5313,
1233
- "step": 10200
1234
- },
1235
- {
1236
- "epoch": 0.47,
1237
- "learning_rate": 5.339638083113576e-05,
1238
- "loss": 2.4989,
1239
- "step": 10250
1240
- },
1241
- {
1242
- "epoch": 0.47,
1243
- "learning_rate": 5.3169046103482765e-05,
1244
- "loss": 2.5062,
1245
- "step": 10300
1246
- },
1247
- {
1248
- "epoch": 0.47,
1249
- "learning_rate": 5.294171137582977e-05,
1250
- "loss": 2.531,
1251
- "step": 10350
1252
- },
1253
- {
1254
- "epoch": 0.47,
1255
- "learning_rate": 5.2714376648176776e-05,
1256
- "loss": 2.4975,
1257
- "step": 10400
1258
- },
1259
- {
1260
- "epoch": 0.48,
1261
- "learning_rate": 5.248704192052378e-05,
1262
- "loss": 2.4922,
1263
- "step": 10450
1264
- },
1265
- {
1266
- "epoch": 0.48,
1267
- "learning_rate": 5.225970719287079e-05,
1268
- "loss": 2.5128,
1269
- "step": 10500
1270
- },
1271
- {
1272
- "epoch": 0.48,
1273
- "learning_rate": 5.203237246521779e-05,
1274
- "loss": 2.504,
1275
- "step": 10550
1276
- },
1277
- {
1278
- "epoch": 0.48,
1279
- "learning_rate": 5.1805037737564785e-05,
1280
- "loss": 2.5093,
1281
- "step": 10600
1282
- },
1283
- {
1284
- "epoch": 0.48,
1285
- "learning_rate": 5.157770300991179e-05,
1286
- "loss": 2.491,
1287
- "step": 10650
1288
- },
1289
- {
1290
- "epoch": 0.49,
1291
- "learning_rate": 5.1350368282258796e-05,
1292
- "loss": 2.5008,
1293
- "step": 10700
1294
- },
1295
- {
1296
- "epoch": 0.49,
1297
- "learning_rate": 5.11230335546058e-05,
1298
- "loss": 2.5103,
1299
- "step": 10750
1300
- },
1301
- {
1302
- "epoch": 0.49,
1303
- "learning_rate": 5.089569882695281e-05,
1304
- "loss": 2.5167,
1305
- "step": 10800
1306
- },
1307
- {
1308
- "epoch": 0.49,
1309
- "learning_rate": 5.066836409929981e-05,
1310
- "loss": 2.5062,
1311
- "step": 10850
1312
- },
1313
- {
1314
- "epoch": 0.5,
1315
- "learning_rate": 5.044102937164682e-05,
1316
- "loss": 2.5135,
1317
- "step": 10900
1318
- },
1319
- {
1320
- "epoch": 0.5,
1321
- "learning_rate": 5.021369464399382e-05,
1322
- "loss": 2.489,
1323
- "step": 10950
1324
- },
1325
- {
1326
- "epoch": 0.5,
1327
- "learning_rate": 4.998635991634082e-05,
1328
- "loss": 2.5071,
1329
- "step": 11000
1330
- },
1331
- {
1332
- "epoch": 0.5,
1333
- "learning_rate": 4.975902518868782e-05,
1334
- "loss": 2.5181,
1335
- "step": 11050
1336
- },
1337
- {
1338
- "epoch": 0.5,
1339
- "learning_rate": 4.953169046103483e-05,
1340
- "loss": 2.4997,
1341
- "step": 11100
1342
- },
1343
- {
1344
- "epoch": 0.51,
1345
- "learning_rate": 4.930435573338183e-05,
1346
- "loss": 2.5127,
1347
- "step": 11150
1348
- },
1349
- {
1350
- "epoch": 0.51,
1351
- "learning_rate": 4.907702100572884e-05,
1352
- "loss": 2.4906,
1353
- "step": 11200
1354
- },
1355
- {
1356
- "epoch": 0.51,
1357
- "learning_rate": 4.884968627807584e-05,
1358
- "loss": 2.5129,
1359
- "step": 11250
1360
- },
1361
- {
1362
- "epoch": 0.51,
1363
- "learning_rate": 4.862235155042284e-05,
1364
- "loss": 2.5015,
1365
- "step": 11300
1366
- },
1367
- {
1368
- "epoch": 0.52,
1369
- "learning_rate": 4.839501682276985e-05,
1370
- "loss": 2.5049,
1371
- "step": 11350
1372
- },
1373
- {
1374
- "epoch": 0.52,
1375
- "learning_rate": 4.8167682095116854e-05,
1376
- "loss": 2.4971,
1377
- "step": 11400
1378
- },
1379
- {
1380
- "epoch": 0.52,
1381
- "learning_rate": 4.794034736746386e-05,
1382
- "loss": 2.5177,
1383
- "step": 11450
1384
- },
1385
- {
1386
- "epoch": 0.52,
1387
- "learning_rate": 4.771301263981086e-05,
1388
- "loss": 2.5056,
1389
- "step": 11500
1390
- },
1391
- {
1392
- "epoch": 0.53,
1393
- "learning_rate": 4.7485677912157864e-05,
1394
- "loss": 2.4831,
1395
- "step": 11550
1396
- },
1397
- {
1398
- "epoch": 0.53,
1399
- "learning_rate": 4.725834318450487e-05,
1400
- "loss": 2.4972,
1401
- "step": 11600
1402
- },
1403
- {
1404
- "epoch": 0.53,
1405
- "learning_rate": 4.7031008456851875e-05,
1406
- "loss": 2.5103,
1407
- "step": 11650
1408
- },
1409
- {
1410
- "epoch": 0.53,
1411
- "learning_rate": 4.680367372919888e-05,
1412
- "loss": 2.5083,
1413
- "step": 11700
1414
- },
1415
- {
1416
- "epoch": 0.53,
1417
- "learning_rate": 4.657633900154588e-05,
1418
- "loss": 2.5027,
1419
- "step": 11750
1420
- },
1421
- {
1422
- "epoch": 0.54,
1423
- "learning_rate": 4.6349004273892885e-05,
1424
- "loss": 2.4846,
1425
- "step": 11800
1426
- },
1427
- {
1428
- "epoch": 0.54,
1429
- "learning_rate": 4.612166954623989e-05,
1430
- "loss": 2.5193,
1431
- "step": 11850
1432
- },
1433
- {
1434
- "epoch": 0.54,
1435
- "learning_rate": 4.589433481858689e-05,
1436
- "loss": 2.5123,
1437
- "step": 11900
1438
- },
1439
- {
1440
- "epoch": 0.54,
1441
- "learning_rate": 4.5667000090933895e-05,
1442
- "loss": 2.5219,
1443
- "step": 11950
1444
- },
1445
- {
1446
- "epoch": 0.55,
1447
- "learning_rate": 4.54396653632809e-05,
1448
- "loss": 2.4979,
1449
- "step": 12000
1450
- },
1451
- {
1452
- "epoch": 0.55,
1453
- "learning_rate": 4.5212330635627906e-05,
1454
- "loss": 2.4849,
1455
- "step": 12050
1456
- },
1457
- {
1458
- "epoch": 0.55,
1459
- "learning_rate": 4.4984995907974905e-05,
1460
- "loss": 2.4783,
1461
- "step": 12100
1462
- },
1463
- {
1464
- "epoch": 0.55,
1465
- "learning_rate": 4.475766118032191e-05,
1466
- "loss": 2.5035,
1467
- "step": 12150
1468
- },
1469
- {
1470
- "epoch": 0.55,
1471
- "learning_rate": 4.4530326452668916e-05,
1472
- "loss": 2.4879,
1473
- "step": 12200
1474
- },
1475
- {
1476
- "epoch": 0.56,
1477
- "learning_rate": 4.4302991725015914e-05,
1478
- "loss": 2.4972,
1479
- "step": 12250
1480
- },
1481
- {
1482
- "epoch": 0.56,
1483
- "learning_rate": 4.407565699736292e-05,
1484
- "loss": 2.5043,
1485
- "step": 12300
1486
- },
1487
- {
1488
- "epoch": 0.56,
1489
- "learning_rate": 4.3848322269709926e-05,
1490
- "loss": 2.491,
1491
- "step": 12350
1492
- },
1493
- {
1494
- "epoch": 0.56,
1495
- "learning_rate": 4.362098754205693e-05,
1496
- "loss": 2.5032,
1497
- "step": 12400
1498
- },
1499
- {
1500
- "epoch": 0.57,
1501
- "learning_rate": 4.339365281440393e-05,
1502
- "loss": 2.5227,
1503
- "step": 12450
1504
- },
1505
- {
1506
- "epoch": 0.57,
1507
- "learning_rate": 4.3166318086750935e-05,
1508
- "loss": 2.5245,
1509
- "step": 12500
1510
- },
1511
- {
1512
- "epoch": 0.57,
1513
- "learning_rate": 4.293898335909794e-05,
1514
- "loss": 2.4927,
1515
- "step": 12550
1516
- },
1517
- {
1518
- "epoch": 0.57,
1519
- "learning_rate": 4.271164863144494e-05,
1520
- "loss": 2.5002,
1521
- "step": 12600
1522
- },
1523
- {
1524
- "epoch": 0.58,
1525
- "learning_rate": 4.2484313903791945e-05,
1526
- "loss": 2.4997,
1527
- "step": 12650
1528
- },
1529
- {
1530
- "epoch": 0.58,
1531
- "learning_rate": 4.225697917613895e-05,
1532
- "loss": 2.4939,
1533
- "step": 12700
1534
- },
1535
- {
1536
- "epoch": 0.58,
1537
- "learning_rate": 4.2029644448485957e-05,
1538
- "loss": 2.5223,
1539
- "step": 12750
1540
- },
1541
- {
1542
- "epoch": 0.58,
1543
- "learning_rate": 4.1802309720832955e-05,
1544
- "loss": 2.4963,
1545
- "step": 12800
1546
- },
1547
- {
1548
- "epoch": 0.58,
1549
- "learning_rate": 4.157497499317996e-05,
1550
- "loss": 2.5334,
1551
- "step": 12850
1552
- },
1553
- {
1554
- "epoch": 0.59,
1555
- "learning_rate": 4.1347640265526966e-05,
1556
- "loss": 2.5085,
1557
- "step": 12900
1558
- },
1559
- {
1560
- "epoch": 0.59,
1561
- "learning_rate": 4.1120305537873965e-05,
1562
- "loss": 2.4901,
1563
- "step": 12950
1564
- },
1565
- {
1566
- "epoch": 0.59,
1567
- "learning_rate": 4.089297081022097e-05,
1568
- "loss": 2.5268,
1569
- "step": 13000
1570
- },
1571
- {
1572
- "epoch": 0.59,
1573
- "learning_rate": 4.0665636082567976e-05,
1574
- "loss": 2.5237,
1575
- "step": 13050
1576
- },
1577
- {
1578
- "epoch": 0.6,
1579
- "learning_rate": 4.043830135491498e-05,
1580
- "loss": 2.4928,
1581
- "step": 13100
1582
- },
1583
- {
1584
- "epoch": 0.6,
1585
- "learning_rate": 4.021096662726198e-05,
1586
- "loss": 2.4852,
1587
- "step": 13150
1588
- },
1589
- {
1590
- "epoch": 0.6,
1591
- "learning_rate": 3.9983631899608986e-05,
1592
- "loss": 2.5139,
1593
- "step": 13200
1594
- },
1595
- {
1596
- "epoch": 0.6,
1597
- "learning_rate": 3.975629717195599e-05,
1598
- "loss": 2.5336,
1599
- "step": 13250
1600
- },
1601
- {
1602
- "epoch": 0.6,
1603
- "learning_rate": 3.952896244430299e-05,
1604
- "loss": 2.527,
1605
- "step": 13300
1606
- },
1607
- {
1608
- "epoch": 0.61,
1609
- "learning_rate": 3.9301627716649996e-05,
1610
- "loss": 2.4877,
1611
- "step": 13350
1612
- },
1613
- {
1614
- "epoch": 0.61,
1615
- "learning_rate": 3.9074292988997e-05,
1616
- "loss": 2.4992,
1617
- "step": 13400
1618
- },
1619
- {
1620
- "epoch": 0.61,
1621
- "learning_rate": 3.884695826134401e-05,
1622
- "loss": 2.4909,
1623
- "step": 13450
1624
- },
1625
- {
1626
- "epoch": 0.61,
1627
- "learning_rate": 3.8619623533691006e-05,
1628
- "loss": 2.4983,
1629
- "step": 13500
1630
- },
1631
- {
1632
- "epoch": 0.62,
1633
- "learning_rate": 3.839228880603801e-05,
1634
- "loss": 2.5146,
1635
- "step": 13550
1636
- },
1637
- {
1638
- "epoch": 0.62,
1639
- "learning_rate": 3.816495407838502e-05,
1640
- "loss": 2.5058,
1641
- "step": 13600
1642
- },
1643
- {
1644
- "epoch": 0.62,
1645
- "learning_rate": 3.7937619350732016e-05,
1646
- "loss": 2.4943,
1647
- "step": 13650
1648
- },
1649
- {
1650
- "epoch": 0.62,
1651
- "learning_rate": 3.771028462307902e-05,
1652
- "loss": 2.5002,
1653
- "step": 13700
1654
- },
1655
- {
1656
- "epoch": 0.63,
1657
- "learning_rate": 3.748294989542603e-05,
1658
- "loss": 2.4918,
1659
- "step": 13750
1660
- },
1661
- {
1662
- "epoch": 0.63,
1663
- "learning_rate": 3.7255615167773026e-05,
1664
- "loss": 2.4915,
1665
- "step": 13800
1666
- },
1667
- {
1668
- "epoch": 0.63,
1669
- "learning_rate": 3.702828044012003e-05,
1670
- "loss": 2.5089,
1671
- "step": 13850
1672
- },
1673
- {
1674
- "epoch": 0.63,
1675
- "learning_rate": 3.680094571246704e-05,
1676
- "loss": 2.5048,
1677
- "step": 13900
1678
- },
1679
- {
1680
- "epoch": 0.63,
1681
- "learning_rate": 3.657361098481404e-05,
1682
- "loss": 2.5108,
1683
- "step": 13950
1684
- },
1685
- {
1686
- "epoch": 0.64,
1687
- "learning_rate": 3.634627625716104e-05,
1688
- "loss": 2.4959,
1689
- "step": 14000
1690
- },
1691
- {
1692
- "epoch": 0.64,
1693
- "learning_rate": 3.611894152950805e-05,
1694
- "loss": 2.5154,
1695
- "step": 14050
1696
- },
1697
- {
1698
- "epoch": 0.64,
1699
- "learning_rate": 3.589160680185505e-05,
1700
- "loss": 2.5092,
1701
- "step": 14100
1702
- },
1703
- {
1704
- "epoch": 0.64,
1705
- "learning_rate": 3.566427207420205e-05,
1706
- "loss": 2.5265,
1707
- "step": 14150
1708
- },
1709
- {
1710
- "epoch": 0.65,
1711
- "learning_rate": 3.543693734654906e-05,
1712
- "loss": 2.4678,
1713
- "step": 14200
1714
- },
1715
- {
1716
- "epoch": 0.65,
1717
- "learning_rate": 3.520960261889606e-05,
1718
- "loss": 2.5236,
1719
- "step": 14250
1720
- },
1721
- {
1722
- "epoch": 0.65,
1723
- "learning_rate": 3.498226789124307e-05,
1724
- "loss": 2.5156,
1725
- "step": 14300
1726
- },
1727
- {
1728
- "epoch": 0.65,
1729
- "learning_rate": 3.475493316359007e-05,
1730
- "loss": 2.508,
1731
- "step": 14350
1732
- },
1733
- {
1734
- "epoch": 0.65,
1735
- "learning_rate": 3.452759843593707e-05,
1736
- "loss": 2.4949,
1737
- "step": 14400
1738
- },
1739
- {
1740
- "epoch": 0.66,
1741
- "learning_rate": 3.430026370828408e-05,
1742
- "loss": 2.4898,
1743
- "step": 14450
1744
- },
1745
- {
1746
- "epoch": 0.66,
1747
- "learning_rate": 3.4072928980631084e-05,
1748
- "loss": 2.5006,
1749
- "step": 14500
1750
- },
1751
- {
1752
- "epoch": 0.66,
1753
- "learning_rate": 3.384559425297808e-05,
1754
- "loss": 2.4878,
1755
- "step": 14550
1756
- },
1757
- {
1758
- "epoch": 0.66,
1759
- "learning_rate": 3.361825952532509e-05,
1760
- "loss": 2.5073,
1761
- "step": 14600
1762
- },
1763
- {
1764
- "epoch": 0.67,
1765
- "learning_rate": 3.3390924797672094e-05,
1766
- "loss": 2.5176,
1767
- "step": 14650
1768
- },
1769
- {
1770
- "epoch": 0.67,
1771
- "learning_rate": 3.31635900700191e-05,
1772
- "loss": 2.5078,
1773
- "step": 14700
1774
- },
1775
- {
1776
- "epoch": 0.67,
1777
- "learning_rate": 3.2936255342366105e-05,
1778
- "loss": 2.5101,
1779
- "step": 14750
1780
- },
1781
- {
1782
- "epoch": 0.67,
1783
- "learning_rate": 3.2708920614713103e-05,
1784
- "loss": 2.5076,
1785
- "step": 14800
1786
- },
1787
- {
1788
- "epoch": 0.68,
1789
- "learning_rate": 3.248158588706011e-05,
1790
- "loss": 2.4916,
1791
- "step": 14850
1792
- },
1793
- {
1794
- "epoch": 0.68,
1795
- "learning_rate": 3.2254251159407115e-05,
1796
- "loss": 2.4919,
1797
- "step": 14900
1798
- },
1799
- {
1800
- "epoch": 0.68,
1801
- "learning_rate": 3.202691643175412e-05,
1802
- "loss": 2.5042,
1803
- "step": 14950
1804
- },
1805
- {
1806
- "epoch": 0.68,
1807
- "learning_rate": 3.1799581704101126e-05,
1808
- "loss": 2.5191,
1809
- "step": 15000
1810
- },
1811
- {
1812
- "epoch": 0.68,
1813
- "learning_rate": 3.1572246976448124e-05,
1814
- "loss": 2.5034,
1815
- "step": 15050
1816
- },
1817
- {
1818
- "epoch": 0.69,
1819
- "learning_rate": 3.134491224879513e-05,
1820
- "loss": 2.4878,
1821
- "step": 15100
1822
- },
1823
- {
1824
- "epoch": 0.69,
1825
- "learning_rate": 3.1117577521142136e-05,
1826
- "loss": 2.5072,
1827
- "step": 15150
1828
- },
1829
- {
1830
- "epoch": 0.69,
1831
- "learning_rate": 3.0890242793489134e-05,
1832
- "loss": 2.506,
1833
- "step": 15200
1834
- },
1835
- {
1836
- "epoch": 0.69,
1837
- "learning_rate": 3.066290806583614e-05,
1838
- "loss": 2.4885,
1839
- "step": 15250
1840
- },
1841
- {
1842
- "epoch": 0.7,
1843
- "learning_rate": 3.0435573338183142e-05,
1844
- "loss": 2.488,
1845
- "step": 15300
1846
- },
1847
- {
1848
- "epoch": 0.7,
1849
- "learning_rate": 3.0208238610530148e-05,
1850
- "loss": 2.4939,
1851
- "step": 15350
1852
- },
1853
- {
1854
- "epoch": 0.7,
1855
- "learning_rate": 2.998090388287715e-05,
1856
- "loss": 2.5397,
1857
- "step": 15400
1858
- },
1859
- {
1860
- "epoch": 0.7,
1861
- "learning_rate": 2.9753569155224152e-05,
1862
- "loss": 2.5131,
1863
- "step": 15450
1864
- },
1865
- {
1866
- "epoch": 0.7,
1867
- "learning_rate": 2.9526234427571158e-05,
1868
- "loss": 2.5287,
1869
- "step": 15500
1870
- },
1871
- {
1872
- "epoch": 0.71,
1873
- "learning_rate": 2.929889969991816e-05,
1874
- "loss": 2.4852,
1875
- "step": 15550
1876
- },
1877
- {
1878
- "epoch": 0.71,
1879
- "learning_rate": 2.9071564972265165e-05,
1880
- "loss": 2.4941,
1881
- "step": 15600
1882
- },
1883
- {
1884
- "epoch": 0.71,
1885
- "learning_rate": 2.884423024461217e-05,
1886
- "loss": 2.508,
1887
- "step": 15650
1888
- },
1889
- {
1890
- "epoch": 0.71,
1891
- "learning_rate": 2.8616895516959173e-05,
1892
- "loss": 2.5011,
1893
- "step": 15700
1894
- },
1895
- {
1896
- "epoch": 0.72,
1897
- "learning_rate": 2.8389560789306175e-05,
1898
- "loss": 2.5029,
1899
- "step": 15750
1900
- },
1901
- {
1902
- "epoch": 0.72,
1903
- "learning_rate": 2.816222606165318e-05,
1904
- "loss": 2.4956,
1905
- "step": 15800
1906
- },
1907
- {
1908
- "epoch": 0.72,
1909
- "learning_rate": 2.7934891334000186e-05,
1910
- "loss": 2.4998,
1911
- "step": 15850
1912
- },
1913
- {
1914
- "epoch": 0.72,
1915
- "learning_rate": 2.7707556606347185e-05,
1916
- "loss": 2.4954,
1917
- "step": 15900
1918
- },
1919
- {
1920
- "epoch": 0.73,
1921
- "learning_rate": 2.748022187869419e-05,
1922
- "loss": 2.5171,
1923
- "step": 15950
1924
- },
1925
- {
1926
- "epoch": 0.73,
1927
- "learning_rate": 2.7252887151041196e-05,
1928
- "loss": 2.476,
1929
- "step": 16000
1930
- },
1931
- {
1932
- "epoch": 0.73,
1933
- "learning_rate": 2.7025552423388202e-05,
1934
- "loss": 2.506,
1935
- "step": 16050
1936
- },
1937
- {
1938
- "epoch": 0.73,
1939
- "learning_rate": 2.67982176957352e-05,
1940
- "loss": 2.5201,
1941
- "step": 16100
1942
- },
1943
- {
1944
- "epoch": 0.73,
1945
- "learning_rate": 2.6570882968082206e-05,
1946
- "loss": 2.5205,
1947
- "step": 16150
1948
- },
1949
- {
1950
- "epoch": 0.74,
1951
- "learning_rate": 2.6343548240429212e-05,
1952
- "loss": 2.4971,
1953
- "step": 16200
1954
- },
1955
- {
1956
- "epoch": 0.74,
1957
- "learning_rate": 2.611621351277621e-05,
1958
- "loss": 2.5135,
1959
- "step": 16250
1960
- },
1961
- {
1962
- "epoch": 0.74,
1963
- "learning_rate": 2.5888878785123216e-05,
1964
- "loss": 2.4894,
1965
- "step": 16300
1966
- },
1967
- {
1968
- "epoch": 0.74,
1969
- "learning_rate": 2.5661544057470222e-05,
1970
- "loss": 2.5127,
1971
- "step": 16350
1972
- },
1973
- {
1974
- "epoch": 0.75,
1975
- "learning_rate": 2.5434209329817227e-05,
1976
- "loss": 2.4999,
1977
- "step": 16400
1978
- },
1979
- {
1980
- "epoch": 0.75,
1981
- "learning_rate": 2.5206874602164226e-05,
1982
- "loss": 2.5048,
1983
- "step": 16450
1984
- },
1985
- {
1986
- "epoch": 0.75,
1987
- "learning_rate": 2.4979539874511232e-05,
1988
- "loss": 2.5208,
1989
- "step": 16500
1990
- },
1991
- {
1992
- "epoch": 0.75,
1993
- "learning_rate": 2.4752205146858234e-05,
1994
- "loss": 2.5155,
1995
- "step": 16550
1996
- },
1997
- {
1998
- "epoch": 0.75,
1999
- "learning_rate": 2.452487041920524e-05,
2000
- "loss": 2.5196,
2001
- "step": 16600
2002
- },
2003
- {
2004
- "epoch": 0.76,
2005
- "learning_rate": 2.429753569155224e-05,
2006
- "loss": 2.5205,
2007
- "step": 16650
2008
- },
2009
- {
2010
- "epoch": 0.76,
2011
- "learning_rate": 2.4070200963899247e-05,
2012
- "loss": 2.5083,
2013
- "step": 16700
2014
- },
2015
- {
2016
- "epoch": 0.76,
2017
- "learning_rate": 2.384286623624625e-05,
2018
- "loss": 2.506,
2019
- "step": 16750
2020
- },
2021
- {
2022
- "epoch": 0.76,
2023
- "learning_rate": 2.361553150859325e-05,
2024
- "loss": 2.5251,
2025
- "step": 16800
2026
- },
2027
- {
2028
- "epoch": 0.77,
2029
- "learning_rate": 2.3388196780940257e-05,
2030
- "loss": 2.5124,
2031
- "step": 16850
2032
- },
2033
- {
2034
- "epoch": 0.77,
2035
- "learning_rate": 2.316086205328726e-05,
2036
- "loss": 2.4869,
2037
- "step": 16900
2038
- },
2039
- {
2040
- "epoch": 0.77,
2041
- "learning_rate": 2.2933527325634265e-05,
2042
- "loss": 2.5066,
2043
- "step": 16950
2044
- },
2045
- {
2046
- "epoch": 0.77,
2047
- "learning_rate": 2.2706192597981267e-05,
2048
- "loss": 2.4888,
2049
- "step": 17000
2050
- },
2051
- {
2052
- "epoch": 0.78,
2053
- "learning_rate": 2.2478857870328273e-05,
2054
- "loss": 2.5086,
2055
- "step": 17050
2056
- },
2057
- {
2058
- "epoch": 0.78,
2059
- "learning_rate": 2.2251523142675275e-05,
2060
- "loss": 2.5449,
2061
- "step": 17100
2062
- },
2063
- {
2064
- "epoch": 0.78,
2065
- "learning_rate": 2.202418841502228e-05,
2066
- "loss": 2.5186,
2067
- "step": 17150
2068
- },
2069
- {
2070
- "epoch": 0.78,
2071
- "learning_rate": 2.1796853687369283e-05,
2072
- "loss": 2.482,
2073
- "step": 17200
2074
- },
2075
- {
2076
- "epoch": 0.78,
2077
- "learning_rate": 2.1569518959716288e-05,
2078
- "loss": 2.4895,
2079
- "step": 17250
2080
- },
2081
- {
2082
- "epoch": 0.79,
2083
- "learning_rate": 2.1342184232063294e-05,
2084
- "loss": 2.4988,
2085
- "step": 17300
2086
- },
2087
- {
2088
- "epoch": 0.79,
2089
- "learning_rate": 2.1114849504410296e-05,
2090
- "loss": 2.5107,
2091
- "step": 17350
2092
- },
2093
- {
2094
- "epoch": 0.79,
2095
- "learning_rate": 2.0887514776757298e-05,
2096
- "loss": 2.5257,
2097
- "step": 17400
2098
- },
2099
- {
2100
- "epoch": 0.79,
2101
- "learning_rate": 2.0660180049104304e-05,
2102
- "loss": 2.5139,
2103
- "step": 17450
2104
- },
2105
- {
2106
- "epoch": 0.8,
2107
- "learning_rate": 2.0432845321451306e-05,
2108
- "loss": 2.5364,
2109
- "step": 17500
2110
- },
2111
- {
2112
- "epoch": 0.8,
2113
- "learning_rate": 2.020551059379831e-05,
2114
- "loss": 2.5242,
2115
- "step": 17550
2116
- },
2117
- {
2118
- "epoch": 0.8,
2119
- "learning_rate": 1.9978175866145313e-05,
2120
- "loss": 2.482,
2121
- "step": 17600
2122
- },
2123
- {
2124
- "epoch": 0.8,
2125
- "learning_rate": 1.975084113849232e-05,
2126
- "loss": 2.4981,
2127
- "step": 17650
2128
- },
2129
- {
2130
- "epoch": 0.8,
2131
- "learning_rate": 1.952350641083932e-05,
2132
- "loss": 2.5049,
2133
- "step": 17700
2134
- },
2135
- {
2136
- "epoch": 0.81,
2137
- "learning_rate": 1.9296171683186323e-05,
2138
- "loss": 2.5089,
2139
- "step": 17750
2140
- },
2141
- {
2142
- "epoch": 0.81,
2143
- "learning_rate": 1.906883695553333e-05,
2144
- "loss": 2.4937,
2145
- "step": 17800
2146
- },
2147
- {
2148
- "epoch": 0.81,
2149
- "learning_rate": 1.884150222788033e-05,
2150
- "loss": 2.4983,
2151
- "step": 17850
2152
- },
2153
- {
2154
- "epoch": 0.81,
2155
- "learning_rate": 1.8614167500227337e-05,
2156
- "loss": 2.5152,
2157
- "step": 17900
2158
- },
2159
- {
2160
- "epoch": 0.82,
2161
- "learning_rate": 1.838683277257434e-05,
2162
- "loss": 2.5198,
2163
- "step": 17950
2164
- },
2165
- {
2166
- "epoch": 0.82,
2167
- "learning_rate": 1.8159498044921344e-05,
2168
- "loss": 2.5108,
2169
- "step": 18000
2170
- },
2171
- {
2172
- "epoch": 0.82,
2173
- "learning_rate": 1.7932163317268347e-05,
2174
- "loss": 2.5362,
2175
- "step": 18050
2176
- },
2177
- {
2178
- "epoch": 0.82,
2179
- "learning_rate": 1.770482858961535e-05,
2180
- "loss": 2.5186,
2181
- "step": 18100
2182
- },
2183
- {
2184
- "epoch": 0.83,
2185
- "learning_rate": 1.7477493861962354e-05,
2186
- "loss": 2.5331,
2187
- "step": 18150
2188
- },
2189
- {
2190
- "epoch": 0.83,
2191
- "learning_rate": 1.7250159134309357e-05,
2192
- "loss": 2.506,
2193
- "step": 18200
2194
- },
2195
- {
2196
- "epoch": 0.83,
2197
- "learning_rate": 1.7022824406656362e-05,
2198
- "loss": 2.4932,
2199
- "step": 18250
2200
- },
2201
- {
2202
- "epoch": 0.83,
2203
- "learning_rate": 1.6795489679003364e-05,
2204
- "loss": 2.4975,
2205
- "step": 18300
2206
- },
2207
- {
2208
- "epoch": 0.83,
2209
- "learning_rate": 1.656815495135037e-05,
2210
- "loss": 2.4996,
2211
- "step": 18350
2212
- },
2213
- {
2214
- "epoch": 0.84,
2215
- "learning_rate": 1.6340820223697372e-05,
2216
- "loss": 2.4987,
2217
- "step": 18400
2218
- },
2219
- {
2220
- "epoch": 0.84,
2221
- "learning_rate": 1.6113485496044374e-05,
2222
- "loss": 2.5013,
2223
- "step": 18450
2224
- },
2225
- {
2226
- "epoch": 0.84,
2227
- "learning_rate": 1.588615076839138e-05,
2228
- "loss": 2.4971,
2229
- "step": 18500
2230
- },
2231
- {
2232
- "epoch": 0.84,
2233
- "learning_rate": 1.5658816040738382e-05,
2234
- "loss": 2.5349,
2235
- "step": 18550
2236
- },
2237
- {
2238
- "epoch": 0.85,
2239
- "learning_rate": 1.5431481313085388e-05,
2240
- "loss": 2.5176,
2241
- "step": 18600
2242
- },
2243
- {
2244
- "epoch": 0.85,
2245
- "learning_rate": 1.5204146585432391e-05,
2246
- "loss": 2.4829,
2247
- "step": 18650
2248
- },
2249
- {
2250
- "epoch": 0.85,
2251
- "learning_rate": 1.4976811857779397e-05,
2252
- "loss": 2.5258,
2253
- "step": 18700
2254
- },
2255
- {
2256
- "epoch": 0.85,
2257
- "learning_rate": 1.47494771301264e-05,
2258
- "loss": 2.5232,
2259
- "step": 18750
2260
- },
2261
- {
2262
- "epoch": 0.85,
2263
- "learning_rate": 1.4522142402473401e-05,
2264
- "loss": 2.5032,
2265
- "step": 18800
2266
- },
2267
- {
2268
- "epoch": 0.86,
2269
- "learning_rate": 1.4294807674820407e-05,
2270
- "loss": 2.5197,
2271
- "step": 18850
2272
- },
2273
- {
2274
- "epoch": 0.86,
2275
- "learning_rate": 1.4067472947167409e-05,
2276
- "loss": 2.5035,
2277
- "step": 18900
2278
- },
2279
- {
2280
- "epoch": 0.86,
2281
- "learning_rate": 1.3840138219514415e-05,
2282
- "loss": 2.5016,
2283
- "step": 18950
2284
- },
2285
- {
2286
- "epoch": 0.86,
2287
- "learning_rate": 1.3612803491861417e-05,
2288
- "loss": 2.5291,
2289
- "step": 19000
2290
- },
2291
- {
2292
- "epoch": 0.87,
2293
- "learning_rate": 1.3385468764208419e-05,
2294
- "loss": 2.5092,
2295
- "step": 19050
2296
- },
2297
- {
2298
- "epoch": 0.87,
2299
- "learning_rate": 1.3158134036555425e-05,
2300
- "loss": 2.4956,
2301
- "step": 19100
2302
- },
2303
- {
2304
- "epoch": 0.87,
2305
- "learning_rate": 1.2930799308902428e-05,
2306
- "loss": 2.4947,
2307
- "step": 19150
2308
- },
2309
- {
2310
- "epoch": 0.87,
2311
- "learning_rate": 1.2703464581249432e-05,
2312
- "loss": 2.519,
2313
- "step": 19200
2314
- },
2315
- {
2316
- "epoch": 0.88,
2317
- "learning_rate": 1.2476129853596436e-05,
2318
- "loss": 2.5452,
2319
- "step": 19250
2320
- },
2321
- {
2322
- "epoch": 0.88,
2323
- "learning_rate": 1.224879512594344e-05,
2324
- "loss": 2.5248,
2325
- "step": 19300
2326
- },
2327
- {
2328
- "epoch": 0.88,
2329
- "learning_rate": 1.2021460398290444e-05,
2330
- "loss": 2.5179,
2331
- "step": 19350
2332
- },
2333
- {
2334
- "epoch": 0.88,
2335
- "learning_rate": 1.1794125670637448e-05,
2336
- "loss": 2.5047,
2337
- "step": 19400
2338
- },
2339
- {
2340
- "epoch": 0.88,
2341
- "learning_rate": 1.1566790942984452e-05,
2342
- "loss": 2.5084,
2343
- "step": 19450
2344
- },
2345
- {
2346
- "epoch": 0.89,
2347
- "learning_rate": 1.1339456215331456e-05,
2348
- "loss": 2.5044,
2349
- "step": 19500
2350
- },
2351
- {
2352
- "epoch": 0.89,
2353
- "learning_rate": 1.1112121487678458e-05,
2354
- "loss": 2.4969,
2355
- "step": 19550
2356
- },
2357
- {
2358
- "epoch": 0.89,
2359
- "learning_rate": 1.0884786760025462e-05,
2360
- "loss": 2.5069,
2361
- "step": 19600
2362
- },
2363
- {
2364
- "epoch": 0.89,
2365
- "learning_rate": 1.0657452032372465e-05,
2366
- "loss": 2.4756,
2367
- "step": 19650
2368
- },
2369
- {
2370
- "epoch": 0.9,
2371
- "learning_rate": 1.043011730471947e-05,
2372
- "loss": 2.489,
2373
- "step": 19700
2374
- },
2375
- {
2376
- "epoch": 0.9,
2377
- "learning_rate": 1.0202782577066473e-05,
2378
- "loss": 2.4985,
2379
- "step": 19750
2380
- },
2381
- {
2382
- "epoch": 0.9,
2383
- "learning_rate": 9.975447849413477e-06,
2384
- "loss": 2.5375,
2385
- "step": 19800
2386
- },
2387
- {
2388
- "epoch": 0.9,
2389
- "learning_rate": 9.748113121760481e-06,
2390
- "loss": 2.4924,
2391
- "step": 19850
2392
- },
2393
- {
2394
- "epoch": 0.9,
2395
- "learning_rate": 9.520778394107483e-06,
2396
- "loss": 2.4879,
2397
- "step": 19900
2398
- },
2399
- {
2400
- "epoch": 0.91,
2401
- "learning_rate": 9.293443666454487e-06,
2402
- "loss": 2.5486,
2403
- "step": 19950
2404
- },
2405
- {
2406
- "epoch": 0.91,
2407
- "learning_rate": 9.066108938801491e-06,
2408
- "loss": 2.5187,
2409
- "step": 20000
2410
- },
2411
- {
2412
- "epoch": 0.91,
2413
- "learning_rate": 8.838774211148495e-06,
2414
- "loss": 2.5102,
2415
- "step": 20050
2416
- },
2417
- {
2418
- "epoch": 0.91,
2419
- "learning_rate": 8.6114394834955e-06,
2420
- "loss": 2.4975,
2421
- "step": 20100
2422
- },
2423
- {
2424
- "epoch": 0.92,
2425
- "learning_rate": 8.384104755842504e-06,
2426
- "loss": 2.5036,
2427
- "step": 20150
2428
- },
2429
- {
2430
- "epoch": 0.92,
2431
- "learning_rate": 8.156770028189506e-06,
2432
- "loss": 2.5323,
2433
- "step": 20200
2434
- },
2435
- {
2436
- "epoch": 0.92,
2437
- "learning_rate": 7.92943530053651e-06,
2438
- "loss": 2.5228,
2439
- "step": 20250
2440
- },
2441
- {
2442
- "epoch": 0.92,
2443
- "learning_rate": 7.702100572883514e-06,
2444
- "loss": 2.5411,
2445
- "step": 20300
2446
- },
2447
- {
2448
- "epoch": 0.93,
2449
- "learning_rate": 7.474765845230518e-06,
2450
- "loss": 2.4923,
2451
- "step": 20350
2452
- },
2453
- {
2454
- "epoch": 0.93,
2455
- "learning_rate": 7.247431117577522e-06,
2456
- "loss": 2.5178,
2457
- "step": 20400
2458
- },
2459
- {
2460
- "epoch": 0.93,
2461
- "learning_rate": 7.020096389924526e-06,
2462
- "loss": 2.5121,
2463
- "step": 20450
2464
- },
2465
- {
2466
- "epoch": 0.93,
2467
- "learning_rate": 6.79276166227153e-06,
2468
- "loss": 2.5414,
2469
- "step": 20500
2470
- },
2471
- {
2472
- "epoch": 0.93,
2473
- "learning_rate": 6.565426934618532e-06,
2474
- "loss": 2.5222,
2475
- "step": 20550
2476
- },
2477
- {
2478
- "epoch": 0.94,
2479
- "learning_rate": 6.338092206965536e-06,
2480
- "loss": 2.5261,
2481
- "step": 20600
2482
- },
2483
- {
2484
- "epoch": 0.94,
2485
- "learning_rate": 6.11075747931254e-06,
2486
- "loss": 2.5274,
2487
- "step": 20650
2488
- },
2489
- {
2490
- "epoch": 0.94,
2491
- "learning_rate": 5.883422751659544e-06,
2492
- "loss": 2.4965,
2493
- "step": 20700
2494
- },
2495
- {
2496
- "epoch": 0.94,
2497
- "learning_rate": 5.656088024006548e-06,
2498
- "loss": 2.5141,
2499
- "step": 20750
2500
- },
2501
- {
2502
- "epoch": 0.95,
2503
- "learning_rate": 5.428753296353551e-06,
2504
- "loss": 2.5101,
2505
- "step": 20800
2506
- },
2507
- {
2508
- "epoch": 0.95,
2509
- "learning_rate": 5.201418568700555e-06,
2510
- "loss": 2.5011,
2511
- "step": 20850
2512
- },
2513
- {
2514
- "epoch": 0.95,
2515
- "learning_rate": 4.974083841047559e-06,
2516
- "loss": 2.5091,
2517
- "step": 20900
2518
- },
2519
- {
2520
- "epoch": 0.95,
2521
- "learning_rate": 4.746749113394562e-06,
2522
- "loss": 2.5237,
2523
- "step": 20950
2524
- },
2525
- {
2526
- "epoch": 0.95,
2527
- "learning_rate": 4.519414385741566e-06,
2528
- "loss": 2.4949,
2529
- "step": 21000
2530
- },
2531
- {
2532
- "epoch": 0.96,
2533
- "learning_rate": 4.29207965808857e-06,
2534
- "loss": 2.503,
2535
- "step": 21050
2536
- },
2537
- {
2538
- "epoch": 0.96,
2539
- "learning_rate": 4.0647449304355735e-06,
2540
- "loss": 2.5068,
2541
- "step": 21100
2542
- },
2543
- {
2544
- "epoch": 0.96,
2545
- "learning_rate": 3.837410202782577e-06,
2546
- "loss": 2.4922,
2547
- "step": 21150
2548
- },
2549
- {
2550
- "epoch": 0.96,
2551
- "learning_rate": 3.6100754751295813e-06,
2552
- "loss": 2.5199,
2553
- "step": 21200
2554
- },
2555
- {
2556
- "epoch": 0.97,
2557
- "learning_rate": 3.382740747476585e-06,
2558
- "loss": 2.5058,
2559
- "step": 21250
2560
- },
2561
- {
2562
- "epoch": 0.97,
2563
- "learning_rate": 3.155406019823588e-06,
2564
- "loss": 2.5294,
2565
- "step": 21300
2566
- },
2567
- {
2568
- "epoch": 0.97,
2569
- "learning_rate": 2.928071292170592e-06,
2570
- "loss": 2.4969,
2571
- "step": 21350
2572
- },
2573
- {
2574
- "epoch": 0.97,
2575
- "learning_rate": 2.700736564517596e-06,
2576
- "loss": 2.5419,
2577
- "step": 21400
2578
- },
2579
- {
2580
- "epoch": 0.98,
2581
- "learning_rate": 2.4734018368645998e-06,
2582
- "loss": 2.5299,
2583
- "step": 21450
2584
- },
2585
- {
2586
- "epoch": 0.98,
2587
- "learning_rate": 2.2460671092116032e-06,
2588
- "loss": 2.5275,
2589
- "step": 21500
2590
- },
2591
- {
2592
- "epoch": 0.98,
2593
- "learning_rate": 2.0187323815586067e-06,
2594
- "loss": 2.4891,
2595
- "step": 21550
2596
- },
2597
- {
2598
- "epoch": 0.98,
2599
- "learning_rate": 1.7913976539056108e-06,
2600
- "loss": 2.5108,
2601
- "step": 21600
2602
- },
2603
- {
2604
- "epoch": 0.98,
2605
- "learning_rate": 1.5640629262526144e-06,
2606
- "loss": 2.5246,
2607
- "step": 21650
2608
- },
2609
- {
2610
- "epoch": 0.99,
2611
- "learning_rate": 1.336728198599618e-06,
2612
- "loss": 2.5304,
2613
- "step": 21700
2614
- },
2615
- {
2616
- "epoch": 0.99,
2617
- "learning_rate": 1.109393470946622e-06,
2618
- "loss": 2.5159,
2619
- "step": 21750
2620
- },
2621
- {
2622
- "epoch": 0.99,
2623
- "learning_rate": 8.820587432936256e-07,
2624
- "loss": 2.5071,
2625
- "step": 21800
2626
- },
2627
- {
2628
- "epoch": 0.99,
2629
- "learning_rate": 6.547240156406293e-07,
2630
- "loss": 2.5091,
2631
- "step": 21850
2632
- },
2633
- {
2634
- "epoch": 1.0,
2635
- "learning_rate": 4.2738928798763303e-07,
2636
- "loss": 2.5386,
2637
- "step": 21900
2638
- },
2639
- {
2640
- "epoch": 1.0,
2641
- "learning_rate": 2.0005456033463672e-07,
2642
- "loss": 2.5228,
2643
- "step": 21950
2644
- },
2645
  {
2646
  "epoch": 1.0,
2647
- "step": 21994,
2648
- "total_flos": 3.604860407937761e+17,
2649
- "train_loss": 2.5232514588412474,
2650
- "train_runtime": 5052.6511,
2651
- "train_samples_per_second": 34.825,
2652
- "train_steps_per_second": 4.353
2653
  }
2654
  ],
2655
  "logging_steps": 50,
2656
- "max_steps": 21994,
2657
  "num_train_epochs": 1,
2658
- "save_steps": 2500,
2659
- "total_flos": 3.604860407937761e+17,
2660
  "trial_name": null,
2661
  "trial_params": null
2662
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.9999916130601427,
5
  "eval_steps": 500,
6
+ "global_step": 3726,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.01,
13
+ "learning_rate": 9.86580783682233e-05,
14
+ "loss": 3.6768,
15
  "step": 50
16
  },
17
  {
18
+ "epoch": 0.03,
19
+ "learning_rate": 9.731615673644659e-05,
20
+ "loss": 3.2865,
21
  "step": 100
22
  },
23
  {
24
+ "epoch": 0.04,
25
+ "learning_rate": 9.597423510466989e-05,
26
+ "loss": 3.1518,
27
  "step": 150
28
  },
29
  {
30
+ "epoch": 0.05,
31
+ "learning_rate": 9.463231347289319e-05,
32
+ "loss": 3.1091,
33
  "step": 200
34
  },
35
  {
36
+ "epoch": 0.07,
37
+ "learning_rate": 9.329039184111649e-05,
38
+ "loss": 3.0456,
39
  "step": 250
40
  },
41
  {
42
+ "epoch": 0.08,
43
+ "learning_rate": 9.194847020933978e-05,
44
+ "loss": 3.0357,
45
  "step": 300
46
  },
47
  {
48
+ "epoch": 0.09,
49
+ "learning_rate": 9.060654857756307e-05,
50
+ "loss": 2.9846,
51
  "step": 350
52
  },
53
  {
54
+ "epoch": 0.11,
55
+ "learning_rate": 8.926462694578636e-05,
56
+ "loss": 2.9834,
57
  "step": 400
58
  },
59
  {
60
+ "epoch": 0.12,
61
+ "learning_rate": 8.792270531400967e-05,
62
+ "loss": 2.9385,
63
  "step": 450
64
  },
65
  {
66
+ "epoch": 0.13,
67
+ "learning_rate": 8.658078368223296e-05,
68
+ "loss": 2.8967,
69
  "step": 500
70
  },
71
  {
72
+ "epoch": 0.15,
73
+ "learning_rate": 8.523886205045626e-05,
74
+ "loss": 2.886,
75
  "step": 550
76
  },
77
  {
78
+ "epoch": 0.16,
79
+ "learning_rate": 8.389694041867955e-05,
80
+ "loss": 2.8671,
81
  "step": 600
82
  },
83
  {
84
+ "epoch": 0.17,
85
+ "learning_rate": 8.255501878690284e-05,
86
+ "loss": 2.8547,
87
  "step": 650
88
  },
89
  {
90
+ "epoch": 0.19,
91
+ "learning_rate": 8.121309715512614e-05,
92
+ "loss": 2.8396,
93
  "step": 700
94
  },
95
  {
96
+ "epoch": 0.2,
97
+ "learning_rate": 7.987117552334944e-05,
98
+ "loss": 2.8531,
99
  "step": 750
100
  },
101
  {
102
+ "epoch": 0.21,
103
+ "learning_rate": 7.852925389157274e-05,
104
+ "loss": 2.8196,
105
  "step": 800
106
  },
107
  {
108
+ "epoch": 0.23,
109
+ "learning_rate": 7.718733225979603e-05,
110
+ "loss": 2.7921,
111
  "step": 850
112
  },
113
  {
114
+ "epoch": 0.24,
115
+ "learning_rate": 7.584541062801933e-05,
116
+ "loss": 2.7596,
117
  "step": 900
118
  },
119
  {
120
+ "epoch": 0.25,
121
+ "learning_rate": 7.450348899624262e-05,
122
+ "loss": 2.7918,
123
  "step": 950
124
  },
125
  {
126
+ "epoch": 0.27,
127
+ "learning_rate": 7.316156736446593e-05,
128
+ "loss": 2.7553,
129
  "step": 1000
130
  },
131
  {
132
+ "epoch": 0.28,
133
+ "learning_rate": 7.181964573268921e-05,
134
+ "loss": 2.7914,
135
  "step": 1050
136
  },
137
  {
138
+ "epoch": 0.3,
139
+ "learning_rate": 7.047772410091251e-05,
140
+ "loss": 2.7924,
141
  "step": 1100
142
  },
143
  {
144
+ "epoch": 0.31,
145
+ "learning_rate": 6.91358024691358e-05,
146
+ "loss": 2.7823,
147
  "step": 1150
148
  },
149
  {
150
+ "epoch": 0.32,
151
+ "learning_rate": 6.77938808373591e-05,
152
+ "loss": 2.7437,
153
  "step": 1200
154
  },
155
  {
156
+ "epoch": 0.34,
157
+ "learning_rate": 6.64519592055824e-05,
158
+ "loss": 2.7404,
159
  "step": 1250
160
  },
161
  {
162
+ "epoch": 0.35,
163
+ "learning_rate": 6.51100375738057e-05,
164
+ "loss": 2.7318,
165
  "step": 1300
166
  },
167
  {
168
+ "epoch": 0.36,
169
+ "learning_rate": 6.376811594202898e-05,
170
+ "loss": 2.708,
171
  "step": 1350
172
  },
173
  {
174
+ "epoch": 0.38,
175
+ "learning_rate": 6.242619431025228e-05,
176
+ "loss": 2.7579,
177
  "step": 1400
178
  },
179
  {
180
+ "epoch": 0.39,
181
+ "learning_rate": 6.108427267847558e-05,
182
+ "loss": 2.7037,
183
  "step": 1450
184
  },
185
  {
186
+ "epoch": 0.4,
187
+ "learning_rate": 5.9742351046698876e-05,
188
+ "loss": 2.7326,
189
  "step": 1500
190
  },
191
  {
192
+ "epoch": 0.42,
193
+ "learning_rate": 5.8400429414922176e-05,
194
+ "loss": 2.7252,
195
  "step": 1550
196
  },
197
  {
198
+ "epoch": 0.43,
199
+ "learning_rate": 5.705850778314546e-05,
200
+ "loss": 2.7263,
201
  "step": 1600
202
  },
203
  {
204
+ "epoch": 0.44,
205
+ "learning_rate": 5.571658615136877e-05,
206
+ "loss": 2.6944,
207
  "step": 1650
208
  },
209
  {
210
+ "epoch": 0.46,
211
+ "learning_rate": 5.4374664519592054e-05,
212
+ "loss": 2.7292,
213
  "step": 1700
214
  },
215
  {
216
+ "epoch": 0.47,
217
+ "learning_rate": 5.3032742887815354e-05,
218
+ "loss": 2.7156,
219
  "step": 1750
220
  },
221
  {
222
+ "epoch": 0.48,
223
+ "learning_rate": 5.1690821256038647e-05,
224
+ "loss": 2.6852,
225
  "step": 1800
226
  },
227
  {
228
+ "epoch": 0.5,
229
+ "learning_rate": 5.0348899624261946e-05,
230
+ "loss": 2.6922,
231
  "step": 1850
232
  },
233
  {
234
+ "epoch": 0.51,
235
+ "learning_rate": 4.9006977992485246e-05,
236
+ "loss": 2.7217,
237
  "step": 1900
238
  },
239
  {
240
+ "epoch": 0.52,
241
+ "learning_rate": 4.766505636070854e-05,
242
+ "loss": 2.6923,
243
  "step": 1950
244
  },
245
  {
246
+ "epoch": 0.54,
247
+ "learning_rate": 4.632313472893183e-05,
248
+ "loss": 2.6993,
249
  "step": 2000
250
  },
251
  {
252
+ "epoch": 0.55,
253
+ "learning_rate": 4.498121309715513e-05,
254
+ "loss": 2.7156,
255
  "step": 2050
256
  },
257
  {
258
+ "epoch": 0.56,
259
+ "learning_rate": 4.3639291465378424e-05,
260
+ "loss": 2.6932,
261
  "step": 2100
262
  },
263
  {
264
+ "epoch": 0.58,
265
+ "learning_rate": 4.2297369833601716e-05,
266
+ "loss": 2.714,
267
  "step": 2150
268
  },
269
  {
270
+ "epoch": 0.59,
271
+ "learning_rate": 4.0955448201825016e-05,
272
+ "loss": 2.6921,
273
  "step": 2200
274
  },
275
  {
276
+ "epoch": 0.6,
277
+ "learning_rate": 3.961352657004831e-05,
278
+ "loss": 2.6949,
279
  "step": 2250
280
  },
281
  {
282
+ "epoch": 0.62,
283
+ "learning_rate": 3.82716049382716e-05,
284
+ "loss": 2.694,
285
  "step": 2300
286
  },
287
  {
288
+ "epoch": 0.63,
289
+ "learning_rate": 3.69296833064949e-05,
290
+ "loss": 2.6781,
291
  "step": 2350
292
  },
293
  {
294
+ "epoch": 0.64,
295
+ "learning_rate": 3.55877616747182e-05,
296
+ "loss": 2.6932,
297
  "step": 2400
298
  },
299
  {
300
+ "epoch": 0.66,
301
+ "learning_rate": 3.4245840042941493e-05,
302
+ "loss": 2.6748,
303
  "step": 2450
304
  },
305
  {
306
+ "epoch": 0.67,
307
+ "learning_rate": 3.290391841116479e-05,
308
+ "loss": 2.6861,
309
  "step": 2500
310
  },
311
  {
312
+ "epoch": 0.68,
313
+ "learning_rate": 3.1561996779388086e-05,
314
+ "loss": 2.6891,
315
  "step": 2550
316
  },
317
  {
318
+ "epoch": 0.7,
319
+ "learning_rate": 3.0220075147611382e-05,
320
+ "loss": 2.7091,
321
  "step": 2600
322
  },
323
  {
324
+ "epoch": 0.71,
325
+ "learning_rate": 2.8878153515834678e-05,
326
+ "loss": 2.6993,
327
  "step": 2650
328
  },
329
  {
330
+ "epoch": 0.72,
331
+ "learning_rate": 2.753623188405797e-05,
332
+ "loss": 2.6827,
333
  "step": 2700
334
  },
335
  {
336
+ "epoch": 0.74,
337
+ "learning_rate": 2.6194310252281267e-05,
338
+ "loss": 2.6876,
339
  "step": 2750
340
  },
341
  {
342
+ "epoch": 0.75,
343
+ "learning_rate": 2.4852388620504563e-05,
344
+ "loss": 2.6812,
345
  "step": 2800
346
  },
347
  {
348
+ "epoch": 0.76,
349
+ "learning_rate": 2.351046698872786e-05,
350
+ "loss": 2.6936,
351
  "step": 2850
352
  },
353
  {
354
+ "epoch": 0.78,
355
+ "learning_rate": 2.2168545356951156e-05,
356
+ "loss": 2.6579,
357
  "step": 2900
358
  },
359
  {
360
+ "epoch": 0.79,
361
+ "learning_rate": 2.0826623725174452e-05,
362
+ "loss": 2.6786,
363
  "step": 2950
364
  },
365
  {
366
+ "epoch": 0.81,
367
+ "learning_rate": 1.9484702093397748e-05,
368
+ "loss": 2.6974,
369
  "step": 3000
370
  },
371
  {
372
+ "epoch": 0.82,
373
+ "learning_rate": 1.814278046162104e-05,
374
+ "loss": 2.6818,
375
  "step": 3050
376
  },
377
  {
378
+ "epoch": 0.83,
379
+ "learning_rate": 1.6800858829844337e-05,
380
+ "loss": 2.6742,
381
  "step": 3100
382
  },
383
  {
384
+ "epoch": 0.85,
385
+ "learning_rate": 1.5458937198067633e-05,
386
+ "loss": 2.6748,
387
  "step": 3150
388
  },
389
  {
390
+ "epoch": 0.86,
391
+ "learning_rate": 1.4117015566290927e-05,
392
+ "loss": 2.6771,
393
  "step": 3200
394
  },
395
  {
396
+ "epoch": 0.87,
397
+ "learning_rate": 1.2775093934514227e-05,
398
+ "loss": 2.6895,
399
  "step": 3250
400
  },
401
  {
402
+ "epoch": 0.89,
403
+ "learning_rate": 1.143317230273752e-05,
404
+ "loss": 2.6663,
405
  "step": 3300
406
  },
407
  {
408
+ "epoch": 0.9,
409
+ "learning_rate": 1.0091250670960816e-05,
410
+ "loss": 2.6485,
411
  "step": 3350
412
  },
413
  {
414
+ "epoch": 0.91,
415
+ "learning_rate": 8.749329039184112e-06,
416
+ "loss": 2.6825,
417
  "step": 3400
418
  },
419
  {
420
+ "epoch": 0.93,
421
+ "learning_rate": 7.4074074074074075e-06,
422
+ "loss": 2.6832,
423
  "step": 3450
424
  },
425
  {
426
+ "epoch": 0.94,
427
+ "learning_rate": 6.065485775630704e-06,
428
+ "loss": 2.7134,
429
  "step": 3500
430
  },
431
  {
432
+ "epoch": 0.95,
433
+ "learning_rate": 4.723564143853999e-06,
434
+ "loss": 2.6631,
435
  "step": 3550
436
  },
437
  {
438
+ "epoch": 0.97,
439
+ "learning_rate": 3.3816425120772947e-06,
440
+ "loss": 2.6862,
441
  "step": 3600
442
  },
443
  {
444
+ "epoch": 0.98,
445
+ "learning_rate": 2.0397208803005905e-06,
446
+ "loss": 2.6663,
447
  "step": 3650
448
  },
449
  {
450
+ "epoch": 0.99,
451
+ "learning_rate": 6.977992485238862e-07,
452
+ "loss": 2.6592,
453
  "step": 3700
454
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
455
  {
456
  "epoch": 1.0,
457
+ "step": 3726,
458
+ "total_flos": 2.4427952859827405e+17,
459
+ "train_loss": 2.7717256223521947,
460
+ "train_runtime": 3102.8464,
461
+ "train_samples_per_second": 38.427,
462
+ "train_steps_per_second": 1.201
463
  }
464
  ],
465
  "logging_steps": 50,
466
+ "max_steps": 3726,
467
  "num_train_epochs": 1,
468
+ "save_steps": 5000,
469
+ "total_flos": 2.4427952859827405e+17,
470
  "trial_name": null,
471
  "trial_params": null
472
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2458e0243aaab0a29fd49a56d9466fcdd0e8ef0f37d199202b23e025f597ffca
3
  size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce83680fb3cb44202bb32f2eca1b7e1d9dd35259d3cb4ec2c75bd5a87a2385d3
3
  size 4027