File size: 108,031 Bytes
304deea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e419a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dcc8019
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d02b805
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d31a9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aae7e8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e55297
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
07/25/2024 06:16:39 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

07/25/2024 06:16:39 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`.
07/25/2024 06:16:40 - WARNING - huggingface_hub.repository - Revision `hopeful-snow-127` does not exist. Created and checked out branch `hopeful-snow-127`.
07/25/2024 06:16:40 - WARNING - huggingface_hub.repository - 
07/25/2024 06:16:41 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet.
07/25/2024 06:16:41 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default
07/25/2024 06:16:41 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:49 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:49 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:54 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988}
07/25/2024 06:16:55 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105}
07/25/2024 06:22:39 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`.
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - Revision `celestial-aardvark-128` does not exist. Created and checked out branch `celestial-aardvark-128`.
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - 
07/25/2024 06:22:41 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet.
07/25/2024 06:22:41 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default
07/25/2024 06:22:41 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10522596 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10512203 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10536479 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485842 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10598254 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485847 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488385 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10610581 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497062 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511500 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486801 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499106 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487725 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486276 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489575 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485918 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491547 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487097 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10498167 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486172 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10686322 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499607 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511515 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11115863 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10530453 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492554 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10640425 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487482 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500290 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10676628 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:59 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988}
07/25/2024 06:23:02 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105}
07/25/2024 06:23:02 - INFO - __main__ - Step 3: {'lr': 1.4285714285714286e-06, 'samples': 144, 'steps': 2, 'loss/train': 10.507988929748535}
07/25/2024 06:23:03 - INFO - __main__ - Step 4: {'lr': 2.142857142857143e-06, 'samples': 192, 'steps': 3, 'loss/train': 10.415447235107422}
07/25/2024 06:23:03 - INFO - __main__ - Step 5: {'lr': 2.8571428571428573e-06, 'samples': 240, 'steps': 4, 'loss/train': 10.345850944519043}
07/25/2024 06:23:03 - INFO - __main__ - Step 6: {'lr': 3.5714285714285714e-06, 'samples': 288, 'steps': 5, 'loss/train': 10.195524215698242}
07/25/2024 06:23:03 - INFO - __main__ - Step 7: {'lr': 4.285714285714286e-06, 'samples': 336, 'steps': 6, 'loss/train': 10.09341812133789}
07/25/2024 06:23:04 - INFO - __main__ - Step 8: {'lr': 5e-06, 'samples': 384, 'steps': 7, 'loss/train': 9.965239524841309}
07/25/2024 06:23:04 - INFO - __main__ - Step 9: {'lr': 5.7142857142857145e-06, 'samples': 432, 'steps': 8, 'loss/train': 9.698853492736816}
07/25/2024 06:23:04 - INFO - __main__ - Step 10: {'lr': 6.428571428571429e-06, 'samples': 480, 'steps': 9, 'loss/train': 9.80683708190918}
07/25/2024 06:23:05 - INFO - __main__ - Step 11: {'lr': 7.142857142857143e-06, 'samples': 528, 'steps': 10, 'loss/train': 9.633079528808594}
07/25/2024 06:23:05 - INFO - __main__ - Step 12: {'lr': 7.857142857142858e-06, 'samples': 576, 'steps': 11, 'loss/train': 9.700591087341309}
07/25/2024 06:23:05 - INFO - __main__ - Step 13: {'lr': 8.571428571428573e-06, 'samples': 624, 'steps': 12, 'loss/train': 9.603139877319336}
07/25/2024 06:23:05 - INFO - __main__ - Step 14: {'lr': 9.285714285714286e-06, 'samples': 672, 'steps': 13, 'loss/train': 9.30308723449707}
07/25/2024 06:23:06 - INFO - __main__ - Step 15: {'lr': 1e-05, 'samples': 720, 'steps': 14, 'loss/train': 9.333526611328125}
07/25/2024 06:23:06 - INFO - __main__ - Step 16: {'lr': 1.0714285714285714e-05, 'samples': 768, 'steps': 15, 'loss/train': 8.336181640625}
07/25/2024 06:23:06 - INFO - __main__ - Step 17: {'lr': 1.1428571428571429e-05, 'samples': 816, 'steps': 16, 'loss/train': 9.075631141662598}
07/25/2024 06:23:07 - INFO - __main__ - Step 18: {'lr': 1.2142857142857142e-05, 'samples': 864, 'steps': 17, 'loss/train': 9.18478012084961}
07/25/2024 06:23:07 - INFO - __main__ - Step 19: {'lr': 1.2857142857142857e-05, 'samples': 912, 'steps': 18, 'loss/train': 8.96328353881836}
07/25/2024 06:23:07 - INFO - __main__ - Step 20: {'lr': 1.3571428571428572e-05, 'samples': 960, 'steps': 19, 'loss/train': 9.45018196105957}
07/25/2024 06:23:07 - INFO - __main__ - Step 21: {'lr': 1.4285714285714285e-05, 'samples': 1008, 'steps': 20, 'loss/train': 8.517333984375}
07/25/2024 06:23:08 - INFO - __main__ - Step 22: {'lr': 1.5e-05, 'samples': 1056, 'steps': 21, 'loss/train': 9.207684516906738}
07/25/2024 06:23:08 - INFO - __main__ - Step 23: {'lr': 1.5714285714285715e-05, 'samples': 1104, 'steps': 22, 'loss/train': 8.681092262268066}
07/25/2024 06:23:08 - INFO - __main__ - Step 24: {'lr': 1.642857142857143e-05, 'samples': 1152, 'steps': 23, 'loss/train': 8.316036224365234}
07/25/2024 06:23:09 - INFO - __main__ - Step 25: {'lr': 1.7142857142857145e-05, 'samples': 1200, 'steps': 24, 'loss/train': 8.944169044494629}
07/25/2024 06:23:09 - INFO - __main__ - Step 26: {'lr': 1.7857142857142855e-05, 'samples': 1248, 'steps': 25, 'loss/train': 8.878201484680176}
07/25/2024 06:23:09 - INFO - __main__ - Step 27: {'lr': 1.8571428571428572e-05, 'samples': 1296, 'steps': 26, 'loss/train': 9.158102989196777}
07/25/2024 06:23:09 - INFO - __main__ - Step 28: {'lr': 1.9285714285714285e-05, 'samples': 1344, 'steps': 27, 'loss/train': 9.14354419708252}
07/25/2024 06:23:10 - INFO - __main__ - Step 29: {'lr': 2e-05, 'samples': 1392, 'steps': 28, 'loss/train': 8.860624313354492}
07/25/2024 06:23:10 - INFO - __main__ - Step 30: {'lr': 2.0714285714285715e-05, 'samples': 1440, 'steps': 29, 'loss/train': 8.876450538635254}
07/25/2024 06:23:10 - INFO - __main__ - Step 31: {'lr': 2.1428571428571428e-05, 'samples': 1488, 'steps': 30, 'loss/train': 8.425738334655762}
07/25/2024 06:23:10 - INFO - __main__ - Step 32: {'lr': 2.214285714285714e-05, 'samples': 1536, 'steps': 31, 'loss/train': 8.942279815673828}
07/25/2024 06:23:11 - INFO - __main__ - Step 33: {'lr': 2.2857142857142858e-05, 'samples': 1584, 'steps': 32, 'loss/train': 8.757084846496582}
07/25/2024 06:23:11 - INFO - __main__ - Step 34: {'lr': 2.3571428571428575e-05, 'samples': 1632, 'steps': 33, 'loss/train': 8.699286460876465}
07/25/2024 06:23:11 - INFO - __main__ - Step 35: {'lr': 2.4285714285714285e-05, 'samples': 1680, 'steps': 34, 'loss/train': 8.857367515563965}
07/25/2024 06:23:12 - INFO - __main__ - Step 36: {'lr': 2.5e-05, 'samples': 1728, 'steps': 35, 'loss/train': 8.830195426940918}
07/25/2024 06:23:12 - INFO - __main__ - Step 37: {'lr': 2.5714285714285714e-05, 'samples': 1776, 'steps': 36, 'loss/train': 8.944982528686523}
07/25/2024 06:23:12 - INFO - __main__ - Step 38: {'lr': 2.642857142857143e-05, 'samples': 1824, 'steps': 37, 'loss/train': 8.670278549194336}
07/25/2024 06:23:12 - INFO - __main__ - Step 39: {'lr': 2.7142857142857144e-05, 'samples': 1872, 'steps': 38, 'loss/train': 8.710525512695312}
07/25/2024 06:23:13 - INFO - __main__ - Step 40: {'lr': 2.7857142857142858e-05, 'samples': 1920, 'steps': 39, 'loss/train': 7.902089595794678}
07/25/2024 06:23:13 - INFO - __main__ - Step 41: {'lr': 2.857142857142857e-05, 'samples': 1968, 'steps': 40, 'loss/train': 8.400484085083008}
07/25/2024 06:23:13 - INFO - __main__ - Step 42: {'lr': 2.9285714285714288e-05, 'samples': 2016, 'steps': 41, 'loss/train': 8.789310455322266}
07/25/2024 06:23:14 - INFO - __main__ - Step 43: {'lr': 3e-05, 'samples': 2064, 'steps': 42, 'loss/train': 8.754344940185547}
07/25/2024 06:23:14 - INFO - __main__ - Step 44: {'lr': 3.071428571428572e-05, 'samples': 2112, 'steps': 43, 'loss/train': 8.84192943572998}
07/25/2024 06:23:14 - INFO - __main__ - Step 45: {'lr': 3.142857142857143e-05, 'samples': 2160, 'steps': 44, 'loss/train': 8.784793853759766}
07/25/2024 06:23:14 - INFO - __main__ - Step 46: {'lr': 3.214285714285714e-05, 'samples': 2208, 'steps': 45, 'loss/train': 8.67403793334961}
07/25/2024 06:23:15 - INFO - __main__ - Step 47: {'lr': 3.285714285714286e-05, 'samples': 2256, 'steps': 46, 'loss/train': 8.51427173614502}
07/25/2024 06:23:15 - INFO - __main__ - Step 48: {'lr': 3.357142857142857e-05, 'samples': 2304, 'steps': 47, 'loss/train': 8.48193073272705}
07/25/2024 06:23:15 - INFO - __main__ - Step 49: {'lr': 3.428571428571429e-05, 'samples': 2352, 'steps': 48, 'loss/train': 8.518038749694824}
07/25/2024 06:23:15 - INFO - __main__ - Step 50: {'lr': 3.5000000000000004e-05, 'samples': 2400, 'steps': 49, 'loss/train': 8.63569450378418}
07/25/2024 06:23:16 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:23:16 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:23:19 - INFO - __main__ - Step 50: {'loss/eval': 8.551246643066406, 'perplexity': 5173.19970703125}
07/25/2024 06:23:20 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:23:20 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:23:20 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:24:11 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
   4d63b0c..304deea  celestial-aardvark-128 -> celestial-aardvark-128

07/25/2024 06:24:11 - INFO - __main__ - Step 51: {'lr': 3.571428571428571e-05, 'samples': 2448, 'steps': 50, 'loss/train': 8.343396186828613}
07/25/2024 06:24:11 - INFO - __main__ - Step 52: {'lr': 3.642857142857143e-05, 'samples': 2496, 'steps': 51, 'loss/train': 8.461634635925293}
07/25/2024 06:24:12 - INFO - __main__ - Step 53: {'lr': 3.7142857142857143e-05, 'samples': 2544, 'steps': 52, 'loss/train': 8.43316650390625}
07/25/2024 06:24:12 - INFO - __main__ - Step 54: {'lr': 3.7857142857142864e-05, 'samples': 2592, 'steps': 53, 'loss/train': 8.464268684387207}
07/25/2024 06:24:12 - INFO - __main__ - Step 55: {'lr': 3.857142857142857e-05, 'samples': 2640, 'steps': 54, 'loss/train': 8.371450424194336}
07/25/2024 06:24:12 - INFO - __main__ - Step 56: {'lr': 3.928571428571428e-05, 'samples': 2688, 'steps': 55, 'loss/train': 8.155680656433105}
07/25/2024 06:24:13 - INFO - __main__ - Step 57: {'lr': 4e-05, 'samples': 2736, 'steps': 56, 'loss/train': 8.359997749328613}
07/25/2024 06:24:13 - INFO - __main__ - Step 58: {'lr': 4.0714285714285717e-05, 'samples': 2784, 'steps': 57, 'loss/train': 7.883953094482422}
07/25/2024 06:24:13 - INFO - __main__ - Step 59: {'lr': 4.142857142857143e-05, 'samples': 2832, 'steps': 58, 'loss/train': 8.425983428955078}
07/25/2024 06:24:14 - INFO - __main__ - Step 60: {'lr': 4.214285714285714e-05, 'samples': 2880, 'steps': 59, 'loss/train': 8.220914840698242}
07/25/2024 06:24:14 - INFO - __main__ - Step 61: {'lr': 4.2857142857142856e-05, 'samples': 2928, 'steps': 60, 'loss/train': 8.216103553771973}
07/25/2024 06:24:14 - INFO - __main__ - Step 62: {'lr': 4.3571428571428576e-05, 'samples': 2976, 'steps': 61, 'loss/train': 8.129951477050781}
07/25/2024 06:24:14 - INFO - __main__ - Step 63: {'lr': 4.428571428571428e-05, 'samples': 3024, 'steps': 62, 'loss/train': 7.993805885314941}
07/25/2024 06:24:15 - INFO - __main__ - Step 64: {'lr': 4.4999999999999996e-05, 'samples': 3072, 'steps': 63, 'loss/train': 6.955376625061035}
07/25/2024 06:24:15 - INFO - __main__ - Step 65: {'lr': 4.5714285714285716e-05, 'samples': 3120, 'steps': 64, 'loss/train': 7.9038238525390625}
07/25/2024 06:24:15 - INFO - __main__ - Step 66: {'lr': 4.642857142857143e-05, 'samples': 3168, 'steps': 65, 'loss/train': 7.659880638122559}
07/25/2024 06:24:16 - INFO - __main__ - Step 67: {'lr': 4.714285714285715e-05, 'samples': 3216, 'steps': 66, 'loss/train': 7.462357521057129}
07/25/2024 06:24:16 - INFO - __main__ - Step 68: {'lr': 4.7857142857142856e-05, 'samples': 3264, 'steps': 67, 'loss/train': 7.9803571701049805}
07/25/2024 06:24:16 - INFO - __main__ - Step 69: {'lr': 4.857142857142857e-05, 'samples': 3312, 'steps': 68, 'loss/train': 7.895639896392822}
07/25/2024 06:24:16 - INFO - __main__ - Step 70: {'lr': 4.928571428571429e-05, 'samples': 3360, 'steps': 69, 'loss/train': 7.726537704467773}
07/25/2024 06:24:17 - INFO - __main__ - Step 71: {'lr': 5e-05, 'samples': 3408, 'steps': 70, 'loss/train': 7.8505425453186035}
07/25/2024 06:24:17 - INFO - __main__ - Step 72: {'lr': 5.0714285714285716e-05, 'samples': 3456, 'steps': 71, 'loss/train': 7.492800235748291}
07/25/2024 06:24:17 - INFO - __main__ - Step 73: {'lr': 5.142857142857143e-05, 'samples': 3504, 'steps': 72, 'loss/train': 7.890054225921631}
07/25/2024 06:24:18 - INFO - __main__ - Step 74: {'lr': 5.214285714285714e-05, 'samples': 3552, 'steps': 73, 'loss/train': 7.429488182067871}
07/25/2024 06:24:18 - INFO - __main__ - Step 75: {'lr': 5.285714285714286e-05, 'samples': 3600, 'steps': 74, 'loss/train': 7.520913600921631}
07/25/2024 06:24:18 - INFO - __main__ - Step 76: {'lr': 5.357142857142857e-05, 'samples': 3648, 'steps': 75, 'loss/train': 7.66839075088501}
07/25/2024 06:24:18 - INFO - __main__ - Step 77: {'lr': 5.428571428571429e-05, 'samples': 3696, 'steps': 76, 'loss/train': 7.810487270355225}
07/25/2024 06:24:19 - INFO - __main__ - Step 78: {'lr': 5.5e-05, 'samples': 3744, 'steps': 77, 'loss/train': 7.009271621704102}
07/25/2024 06:24:19 - INFO - __main__ - Step 79: {'lr': 5.5714285714285715e-05, 'samples': 3792, 'steps': 78, 'loss/train': 7.631109714508057}
07/25/2024 06:24:19 - INFO - __main__ - Step 80: {'lr': 5.642857142857143e-05, 'samples': 3840, 'steps': 79, 'loss/train': 6.9839606285095215}
07/25/2024 06:24:20 - INFO - __main__ - Step 81: {'lr': 5.714285714285714e-05, 'samples': 3888, 'steps': 80, 'loss/train': 7.642471790313721}
07/25/2024 06:24:20 - INFO - __main__ - Step 82: {'lr': 5.7857142857142855e-05, 'samples': 3936, 'steps': 81, 'loss/train': 7.183259010314941}
07/25/2024 06:24:20 - INFO - __main__ - Step 83: {'lr': 5.8571428571428575e-05, 'samples': 3984, 'steps': 82, 'loss/train': 7.3919596672058105}
07/25/2024 06:24:20 - INFO - __main__ - Step 84: {'lr': 5.928571428571429e-05, 'samples': 4032, 'steps': 83, 'loss/train': 7.52573299407959}
07/25/2024 06:24:21 - INFO - __main__ - Step 85: {'lr': 6e-05, 'samples': 4080, 'steps': 84, 'loss/train': 7.169320583343506}
07/25/2024 06:24:21 - INFO - __main__ - Step 86: {'lr': 6.0714285714285715e-05, 'samples': 4128, 'steps': 85, 'loss/train': 7.095631122589111}
07/25/2024 06:24:21 - INFO - __main__ - Step 87: {'lr': 6.142857142857143e-05, 'samples': 4176, 'steps': 86, 'loss/train': 7.257204532623291}
07/25/2024 06:24:21 - INFO - __main__ - Step 88: {'lr': 6.214285714285714e-05, 'samples': 4224, 'steps': 87, 'loss/train': 6.010106563568115}
07/25/2024 06:24:22 - INFO - __main__ - Step 89: {'lr': 6.285714285714286e-05, 'samples': 4272, 'steps': 88, 'loss/train': 7.189196586608887}
07/25/2024 06:24:22 - INFO - __main__ - Step 90: {'lr': 6.357142857142857e-05, 'samples': 4320, 'steps': 89, 'loss/train': 6.902089595794678}
07/25/2024 06:24:22 - INFO - __main__ - Step 91: {'lr': 6.428571428571427e-05, 'samples': 4368, 'steps': 90, 'loss/train': 6.5942535400390625}
07/25/2024 06:24:23 - INFO - __main__ - Step 92: {'lr': 6.500000000000001e-05, 'samples': 4416, 'steps': 91, 'loss/train': 7.392148017883301}
07/25/2024 06:24:23 - INFO - __main__ - Step 93: {'lr': 6.571428571428571e-05, 'samples': 4464, 'steps': 92, 'loss/train': 6.586553573608398}
07/25/2024 06:24:23 - INFO - __main__ - Step 94: {'lr': 6.642857142857143e-05, 'samples': 4512, 'steps': 93, 'loss/train': 7.5296549797058105}
07/25/2024 06:24:23 - INFO - __main__ - Step 95: {'lr': 6.714285714285714e-05, 'samples': 4560, 'steps': 94, 'loss/train': 7.048985481262207}
07/25/2024 06:24:24 - INFO - __main__ - Step 96: {'lr': 6.785714285714285e-05, 'samples': 4608, 'steps': 95, 'loss/train': 4.687469959259033}
07/25/2024 06:24:24 - INFO - __main__ - Step 97: {'lr': 6.857142857142858e-05, 'samples': 4656, 'steps': 96, 'loss/train': 7.1623854637146}
07/25/2024 06:24:24 - INFO - __main__ - Step 98: {'lr': 6.928571428571429e-05, 'samples': 4704, 'steps': 97, 'loss/train': 6.722190856933594}
07/25/2024 06:24:25 - INFO - __main__ - Step 99: {'lr': 7.000000000000001e-05, 'samples': 4752, 'steps': 98, 'loss/train': 6.930887699127197}
07/25/2024 06:24:25 - INFO - __main__ - Step 100: {'lr': 7.071428571428571e-05, 'samples': 4800, 'steps': 99, 'loss/train': 7.2268805503845215}
07/25/2024 06:24:25 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:24:25 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:24:28 - INFO - __main__ - Step 100: {'loss/eval': 7.000552177429199, 'perplexity': 1097.2388916015625}
07/25/2024 06:24:29 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:24:29 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:24:29 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:25:31 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream.
07/25/2024 06:25:31 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:25:53 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
   304deea..1e419a3  celestial-aardvark-128 -> celestial-aardvark-128

07/25/2024 06:25:54 - INFO - __main__ - Step 101: {'lr': 7.142857142857142e-05, 'samples': 4848, 'steps': 100, 'loss/train': 6.872276306152344}
07/25/2024 06:25:55 - INFO - __main__ - Step 102: {'lr': 7.214285714285715e-05, 'samples': 4896, 'steps': 101, 'loss/train': 5.04807710647583}
07/25/2024 06:25:55 - INFO - __main__ - Step 103: {'lr': 7.285714285714286e-05, 'samples': 4944, 'steps': 102, 'loss/train': 6.8386383056640625}
07/25/2024 06:25:55 - INFO - __main__ - Step 104: {'lr': 7.357142857142857e-05, 'samples': 4992, 'steps': 103, 'loss/train': 6.707127571105957}
07/25/2024 06:25:56 - INFO - __main__ - Step 105: {'lr': 7.428571428571429e-05, 'samples': 5040, 'steps': 104, 'loss/train': 6.885215759277344}
07/25/2024 06:25:56 - INFO - __main__ - Step 106: {'lr': 7.5e-05, 'samples': 5088, 'steps': 105, 'loss/train': 6.762844562530518}
07/25/2024 06:25:56 - INFO - __main__ - Step 107: {'lr': 7.571428571428573e-05, 'samples': 5136, 'steps': 106, 'loss/train': 6.92085599899292}
07/25/2024 06:25:56 - INFO - __main__ - Step 108: {'lr': 7.642857142857143e-05, 'samples': 5184, 'steps': 107, 'loss/train': 6.639281749725342}
07/25/2024 06:25:57 - INFO - __main__ - Step 109: {'lr': 7.714285714285714e-05, 'samples': 5232, 'steps': 108, 'loss/train': 6.710461616516113}
07/25/2024 06:25:57 - INFO - __main__ - Step 110: {'lr': 7.785714285714286e-05, 'samples': 5280, 'steps': 109, 'loss/train': 3.4145185947418213}
07/25/2024 06:25:57 - INFO - __main__ - Step 111: {'lr': 7.857142857142857e-05, 'samples': 5328, 'steps': 110, 'loss/train': 6.69966983795166}
07/25/2024 06:25:58 - INFO - __main__ - Step 112: {'lr': 7.928571428571429e-05, 'samples': 5376, 'steps': 111, 'loss/train': 6.780115127563477}
07/25/2024 06:25:58 - INFO - __main__ - Step 113: {'lr': 8e-05, 'samples': 5424, 'steps': 112, 'loss/train': 6.512848377227783}
07/25/2024 06:25:58 - INFO - __main__ - Step 114: {'lr': 8.071428571428571e-05, 'samples': 5472, 'steps': 113, 'loss/train': 6.558418273925781}
07/25/2024 06:25:58 - INFO - __main__ - Step 115: {'lr': 8.142857142857143e-05, 'samples': 5520, 'steps': 114, 'loss/train': 6.531116485595703}
07/25/2024 06:25:59 - INFO - __main__ - Step 116: {'lr': 8.214285714285714e-05, 'samples': 5568, 'steps': 115, 'loss/train': 6.557308197021484}
07/25/2024 06:25:59 - INFO - __main__ - Step 117: {'lr': 8.285714285714286e-05, 'samples': 5616, 'steps': 116, 'loss/train': 6.023952960968018}
07/25/2024 06:25:59 - INFO - __main__ - Step 118: {'lr': 8.357142857142858e-05, 'samples': 5664, 'steps': 117, 'loss/train': 7.063660144805908}
07/25/2024 06:26:00 - INFO - __main__ - Step 119: {'lr': 8.428571428571429e-05, 'samples': 5712, 'steps': 118, 'loss/train': 6.6882853507995605}
07/25/2024 06:26:00 - INFO - __main__ - Step 120: {'lr': 8.5e-05, 'samples': 5760, 'steps': 119, 'loss/train': 5.413237571716309}
07/25/2024 06:26:00 - INFO - __main__ - Step 121: {'lr': 8.571428571428571e-05, 'samples': 5808, 'steps': 120, 'loss/train': 6.166462421417236}
07/25/2024 06:26:00 - INFO - __main__ - Step 122: {'lr': 8.642857142857143e-05, 'samples': 5856, 'steps': 121, 'loss/train': 6.413567543029785}
07/25/2024 06:26:01 - INFO - __main__ - Step 123: {'lr': 8.714285714285715e-05, 'samples': 5904, 'steps': 122, 'loss/train': 6.3801727294921875}
07/25/2024 06:26:01 - INFO - __main__ - Step 124: {'lr': 8.785714285714286e-05, 'samples': 5952, 'steps': 123, 'loss/train': 7.042605400085449}
07/25/2024 06:26:01 - INFO - __main__ - Step 125: {'lr': 8.857142857142857e-05, 'samples': 6000, 'steps': 124, 'loss/train': 6.735599517822266}
07/25/2024 06:26:01 - INFO - __main__ - Step 126: {'lr': 8.928571428571429e-05, 'samples': 6048, 'steps': 125, 'loss/train': 6.620289325714111}
07/25/2024 06:26:02 - INFO - __main__ - Step 127: {'lr': 8.999999999999999e-05, 'samples': 6096, 'steps': 126, 'loss/train': 6.738864421844482}
07/25/2024 06:26:02 - INFO - __main__ - Step 128: {'lr': 9.071428571428573e-05, 'samples': 6144, 'steps': 127, 'loss/train': 6.406912326812744}
07/25/2024 06:26:02 - INFO - __main__ - Step 129: {'lr': 9.142857142857143e-05, 'samples': 6192, 'steps': 128, 'loss/train': 6.422929286956787}
07/25/2024 06:26:03 - INFO - __main__ - Step 130: {'lr': 9.214285714285714e-05, 'samples': 6240, 'steps': 129, 'loss/train': 6.476966381072998}
07/25/2024 06:26:03 - INFO - __main__ - Step 131: {'lr': 9.285714285714286e-05, 'samples': 6288, 'steps': 130, 'loss/train': 6.289211273193359}
07/25/2024 06:26:03 - INFO - __main__ - Step 132: {'lr': 9.357142857142857e-05, 'samples': 6336, 'steps': 131, 'loss/train': 6.4881696701049805}
07/25/2024 06:26:03 - INFO - __main__ - Step 133: {'lr': 9.42857142857143e-05, 'samples': 6384, 'steps': 132, 'loss/train': 6.840321063995361}
07/25/2024 06:26:04 - INFO - __main__ - Step 134: {'lr': 9.5e-05, 'samples': 6432, 'steps': 133, 'loss/train': 6.22948694229126}
07/25/2024 06:26:04 - INFO - __main__ - Step 135: {'lr': 9.571428571428571e-05, 'samples': 6480, 'steps': 134, 'loss/train': 5.924211025238037}
07/25/2024 06:26:04 - INFO - __main__ - Step 136: {'lr': 9.642857142857143e-05, 'samples': 6528, 'steps': 135, 'loss/train': 8.402527809143066}
07/25/2024 06:26:05 - INFO - __main__ - Step 137: {'lr': 9.714285714285714e-05, 'samples': 6576, 'steps': 136, 'loss/train': 6.357081413269043}
07/25/2024 06:26:05 - INFO - __main__ - Step 138: {'lr': 9.785714285714286e-05, 'samples': 6624, 'steps': 137, 'loss/train': 6.335728168487549}
07/25/2024 06:26:05 - INFO - __main__ - Step 139: {'lr': 9.857142857142858e-05, 'samples': 6672, 'steps': 138, 'loss/train': 6.388386249542236}
07/25/2024 06:26:05 - INFO - __main__ - Step 140: {'lr': 9.928571428571428e-05, 'samples': 6720, 'steps': 139, 'loss/train': 6.144318103790283}
07/25/2024 06:26:06 - INFO - __main__ - Step 141: {'lr': 0.0001, 'samples': 6768, 'steps': 140, 'loss/train': 5.887519359588623}
07/25/2024 06:26:06 - INFO - __main__ - Step 142: {'lr': 0.00010071428571428571, 'samples': 6816, 'steps': 141, 'loss/train': 6.515809059143066}
07/25/2024 06:26:06 - INFO - __main__ - Step 143: {'lr': 0.00010142857142857143, 'samples': 6864, 'steps': 142, 'loss/train': 6.273582458496094}
07/25/2024 06:26:07 - INFO - __main__ - Step 144: {'lr': 0.00010214285714285715, 'samples': 6912, 'steps': 143, 'loss/train': 6.12056303024292}
07/25/2024 06:26:07 - INFO - __main__ - Step 145: {'lr': 0.00010285714285714286, 'samples': 6960, 'steps': 144, 'loss/train': 6.281930446624756}
07/25/2024 06:26:07 - INFO - __main__ - Step 146: {'lr': 0.00010357142857142858, 'samples': 7008, 'steps': 145, 'loss/train': 6.347898483276367}
07/25/2024 06:26:07 - INFO - __main__ - Step 147: {'lr': 0.00010428571428571428, 'samples': 7056, 'steps': 146, 'loss/train': 6.053178787231445}
07/25/2024 06:26:08 - INFO - __main__ - Step 148: {'lr': 0.000105, 'samples': 7104, 'steps': 147, 'loss/train': 6.299071311950684}
07/25/2024 06:26:08 - INFO - __main__ - Step 149: {'lr': 0.00010571428571428572, 'samples': 7152, 'steps': 148, 'loss/train': 6.214033603668213}
07/25/2024 06:26:08 - INFO - __main__ - Step 150: {'lr': 0.00010642857142857143, 'samples': 7200, 'steps': 149, 'loss/train': 6.36629056930542}
07/25/2024 06:26:08 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:26:08 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:26:12 - INFO - __main__ - Step 150: {'loss/eval': 6.422665119171143, 'perplexity': 615.6417236328125}
07/25/2024 06:26:12 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:26:12 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:26:13 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:27:15 - WARNING - huggingface_hub.repository - Several commits (3) will be pushed upstream.
07/25/2024 06:27:15 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:27:38 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
   1e419a3..dcc8019  celestial-aardvark-128 -> celestial-aardvark-128

07/25/2024 06:27:38 - INFO - __main__ - Step 151: {'lr': 0.00010714285714285714, 'samples': 7248, 'steps': 150, 'loss/train': 6.574682235717773}
07/25/2024 06:27:38 - INFO - __main__ - Step 152: {'lr': 0.00010785714285714286, 'samples': 7296, 'steps': 151, 'loss/train': 5.2919840812683105}
07/25/2024 06:27:39 - INFO - __main__ - Step 153: {'lr': 0.00010857142857142858, 'samples': 7344, 'steps': 152, 'loss/train': 6.282163143157959}
07/25/2024 06:27:39 - INFO - __main__ - Step 154: {'lr': 0.0001092857142857143, 'samples': 7392, 'steps': 153, 'loss/train': 6.462711334228516}
07/25/2024 06:27:39 - INFO - __main__ - Step 155: {'lr': 0.00011, 'samples': 7440, 'steps': 154, 'loss/train': 5.595396518707275}
07/25/2024 06:27:39 - INFO - __main__ - Step 156: {'lr': 0.00011071428571428571, 'samples': 7488, 'steps': 155, 'loss/train': 6.128833293914795}
07/25/2024 06:27:40 - INFO - __main__ - Step 157: {'lr': 0.00011142857142857143, 'samples': 7536, 'steps': 156, 'loss/train': 6.035909652709961}
07/25/2024 06:27:40 - INFO - __main__ - Step 158: {'lr': 0.00011214285714285715, 'samples': 7584, 'steps': 157, 'loss/train': 6.275477886199951}
07/25/2024 06:27:40 - INFO - __main__ - Step 159: {'lr': 0.00011285714285714286, 'samples': 7632, 'steps': 158, 'loss/train': 6.1195969581604}
07/25/2024 06:27:40 - INFO - __main__ - Step 160: {'lr': 0.00011357142857142858, 'samples': 7680, 'steps': 159, 'loss/train': 8.316116333007812}
07/25/2024 06:27:41 - INFO - __main__ - Step 161: {'lr': 0.00011428571428571428, 'samples': 7728, 'steps': 160, 'loss/train': 6.287449836730957}
07/25/2024 06:27:41 - INFO - __main__ - Step 162: {'lr': 0.000115, 'samples': 7776, 'steps': 161, 'loss/train': 5.879787445068359}
07/25/2024 06:27:41 - INFO - __main__ - Step 163: {'lr': 0.00011571428571428571, 'samples': 7824, 'steps': 162, 'loss/train': 6.221517086029053}
07/25/2024 06:27:42 - INFO - __main__ - Step 164: {'lr': 0.00011642857142857143, 'samples': 7872, 'steps': 163, 'loss/train': 5.967787265777588}
07/25/2024 06:27:42 - INFO - __main__ - Step 165: {'lr': 0.00011714285714285715, 'samples': 7920, 'steps': 164, 'loss/train': 6.09508752822876}
07/25/2024 06:27:42 - INFO - __main__ - Step 166: {'lr': 0.00011785714285714286, 'samples': 7968, 'steps': 165, 'loss/train': 6.462942123413086}
07/25/2024 06:27:42 - INFO - __main__ - Step 167: {'lr': 0.00011857142857142858, 'samples': 8016, 'steps': 166, 'loss/train': 6.146663188934326}
07/25/2024 06:27:43 - INFO - __main__ - Step 168: {'lr': 0.00011928571428571428, 'samples': 8064, 'steps': 167, 'loss/train': 6.4038286209106445}
07/25/2024 06:27:43 - INFO - __main__ - Step 169: {'lr': 0.00012, 'samples': 8112, 'steps': 168, 'loss/train': 6.267633438110352}
07/25/2024 06:27:43 - INFO - __main__ - Step 170: {'lr': 0.00012071428571428572, 'samples': 8160, 'steps': 169, 'loss/train': 6.64249324798584}
07/25/2024 06:27:44 - INFO - __main__ - Step 171: {'lr': 0.00012142857142857143, 'samples': 8208, 'steps': 170, 'loss/train': 6.448271751403809}
07/25/2024 06:27:44 - INFO - __main__ - Step 172: {'lr': 0.00012214285714285715, 'samples': 8256, 'steps': 171, 'loss/train': 6.485412120819092}
07/25/2024 06:27:44 - INFO - __main__ - Step 173: {'lr': 0.00012285714285714287, 'samples': 8304, 'steps': 172, 'loss/train': 6.213407516479492}
07/25/2024 06:27:44 - INFO - __main__ - Step 174: {'lr': 0.00012357142857142856, 'samples': 8352, 'steps': 173, 'loss/train': 5.832103729248047}
07/25/2024 06:27:45 - INFO - __main__ - Step 175: {'lr': 0.00012428571428571428, 'samples': 8400, 'steps': 174, 'loss/train': 5.645206928253174}
07/25/2024 06:27:45 - INFO - __main__ - Step 176: {'lr': 0.000125, 'samples': 8448, 'steps': 175, 'loss/train': 5.942577838897705}
07/25/2024 06:27:45 - INFO - __main__ - Step 177: {'lr': 0.00012571428571428572, 'samples': 8496, 'steps': 176, 'loss/train': 6.108009338378906}
07/25/2024 06:27:46 - INFO - __main__ - Step 178: {'lr': 0.00012642857142857142, 'samples': 8544, 'steps': 177, 'loss/train': 6.048696994781494}
07/25/2024 06:27:46 - INFO - __main__ - Step 179: {'lr': 0.00012714285714285714, 'samples': 8592, 'steps': 178, 'loss/train': 6.014152526855469}
07/25/2024 06:27:46 - INFO - __main__ - Step 180: {'lr': 0.00012785714285714286, 'samples': 8640, 'steps': 179, 'loss/train': 6.590332508087158}
07/25/2024 06:27:46 - INFO - __main__ - Step 181: {'lr': 0.00012857142857142855, 'samples': 8688, 'steps': 180, 'loss/train': 6.095800399780273}
07/25/2024 06:27:47 - INFO - __main__ - Step 182: {'lr': 0.0001292857142857143, 'samples': 8736, 'steps': 181, 'loss/train': 5.968374729156494}
07/25/2024 06:27:47 - INFO - __main__ - Step 183: {'lr': 0.00013000000000000002, 'samples': 8784, 'steps': 182, 'loss/train': 6.073035717010498}
07/25/2024 06:27:47 - INFO - __main__ - Step 184: {'lr': 0.00013071428571428574, 'samples': 8832, 'steps': 183, 'loss/train': 7.681509494781494}
07/25/2024 06:27:47 - INFO - __main__ - Step 185: {'lr': 0.00013142857142857143, 'samples': 8880, 'steps': 184, 'loss/train': 5.806171417236328}
07/25/2024 06:27:48 - INFO - __main__ - Step 186: {'lr': 0.00013214285714285715, 'samples': 8928, 'steps': 185, 'loss/train': 5.868297576904297}
07/25/2024 06:27:48 - INFO - __main__ - Step 187: {'lr': 0.00013285714285714287, 'samples': 8976, 'steps': 186, 'loss/train': 5.532838344573975}
07/25/2024 06:27:48 - INFO - __main__ - Step 188: {'lr': 0.00013357142857142856, 'samples': 9024, 'steps': 187, 'loss/train': 6.210916042327881}
07/25/2024 06:27:49 - INFO - __main__ - Step 189: {'lr': 0.00013428571428571428, 'samples': 9072, 'steps': 188, 'loss/train': 5.803860187530518}
07/25/2024 06:27:49 - INFO - __main__ - Step 190: {'lr': 0.000135, 'samples': 9120, 'steps': 189, 'loss/train': 6.666335105895996}
07/25/2024 06:27:49 - INFO - __main__ - Step 191: {'lr': 0.0001357142857142857, 'samples': 9168, 'steps': 190, 'loss/train': 5.624790668487549}
07/25/2024 06:27:49 - INFO - __main__ - Step 192: {'lr': 0.00013642857142857144, 'samples': 9216, 'steps': 191, 'loss/train': 5.217100143432617}
07/25/2024 06:27:50 - INFO - __main__ - Step 193: {'lr': 0.00013714285714285716, 'samples': 9264, 'steps': 192, 'loss/train': 5.951303482055664}
07/25/2024 06:27:50 - INFO - __main__ - Step 194: {'lr': 0.00013785714285714285, 'samples': 9312, 'steps': 193, 'loss/train': 5.851853847503662}
07/25/2024 06:27:50 - INFO - __main__ - Step 195: {'lr': 0.00013857142857142857, 'samples': 9360, 'steps': 194, 'loss/train': 5.776468276977539}
07/25/2024 06:27:51 - INFO - __main__ - Step 196: {'lr': 0.0001392857142857143, 'samples': 9408, 'steps': 195, 'loss/train': 5.7882866859436035}
07/25/2024 06:27:51 - INFO - __main__ - Step 197: {'lr': 0.00014000000000000001, 'samples': 9456, 'steps': 196, 'loss/train': 5.621963024139404}
07/25/2024 06:27:51 - INFO - __main__ - Step 198: {'lr': 0.0001407142857142857, 'samples': 9504, 'steps': 197, 'loss/train': 5.277397632598877}
07/25/2024 06:27:51 - INFO - __main__ - Step 199: {'lr': 0.00014142857142857143, 'samples': 9552, 'steps': 198, 'loss/train': 5.9324951171875}
07/25/2024 06:27:52 - INFO - __main__ - Step 200: {'lr': 0.00014214285714285715, 'samples': 9600, 'steps': 199, 'loss/train': 6.0901618003845215}
07/25/2024 06:27:52 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:27:52 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:27:55 - INFO - __main__ - Step 200: {'loss/eval': 6.142789840698242, 'perplexity': 465.3500061035156}
07/25/2024 06:27:56 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:27:56 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:27:56 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:28:59 - WARNING - huggingface_hub.repository - Several commits (4) will be pushed upstream.
07/25/2024 06:28:59 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:29:25 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
   dcc8019..d02b805  celestial-aardvark-128 -> celestial-aardvark-128

07/25/2024 06:29:25 - INFO - __main__ - Step 201: {'lr': 0.00014285714285714284, 'samples': 9648, 'steps': 200, 'loss/train': 5.745926856994629}
07/25/2024 06:29:25 - INFO - __main__ - Step 202: {'lr': 0.0001435714285714286, 'samples': 9696, 'steps': 201, 'loss/train': 6.288934707641602}
07/25/2024 06:29:26 - INFO - __main__ - Step 203: {'lr': 0.0001442857142857143, 'samples': 9744, 'steps': 202, 'loss/train': 6.304495811462402}
07/25/2024 06:29:26 - INFO - __main__ - Step 204: {'lr': 0.000145, 'samples': 9792, 'steps': 203, 'loss/train': 6.896693706512451}
07/25/2024 06:29:26 - INFO - __main__ - Step 205: {'lr': 0.00014571428571428572, 'samples': 9840, 'steps': 204, 'loss/train': 5.75565767288208}
07/25/2024 06:29:26 - INFO - __main__ - Step 206: {'lr': 0.00014642857142857144, 'samples': 9888, 'steps': 205, 'loss/train': 6.053487300872803}
07/25/2024 06:29:27 - INFO - __main__ - Step 207: {'lr': 0.00014714285714285713, 'samples': 9936, 'steps': 206, 'loss/train': 5.872729301452637}
07/25/2024 06:29:27 - INFO - __main__ - Step 208: {'lr': 0.00014785714285714285, 'samples': 9984, 'steps': 207, 'loss/train': 7.389420509338379}
07/25/2024 06:29:27 - INFO - __main__ - Step 209: {'lr': 0.00014857142857142857, 'samples': 10032, 'steps': 208, 'loss/train': 6.749051570892334}
07/25/2024 06:29:27 - INFO - __main__ - Step 210: {'lr': 0.0001492857142857143, 'samples': 10080, 'steps': 209, 'loss/train': 5.964937210083008}
07/25/2024 06:29:28 - INFO - __main__ - Step 211: {'lr': 0.00015, 'samples': 10128, 'steps': 210, 'loss/train': 6.29296350479126}
07/25/2024 06:29:28 - INFO - __main__ - Step 212: {'lr': 0.0001507142857142857, 'samples': 10176, 'steps': 211, 'loss/train': 6.124290466308594}
07/25/2024 06:29:28 - INFO - __main__ - Step 213: {'lr': 0.00015142857142857145, 'samples': 10224, 'steps': 212, 'loss/train': 6.875829219818115}
07/25/2024 06:29:29 - INFO - __main__ - Step 214: {'lr': 0.00015214285714285715, 'samples': 10272, 'steps': 213, 'loss/train': 6.973008155822754}
07/25/2024 06:29:29 - INFO - __main__ - Step 215: {'lr': 0.00015285714285714287, 'samples': 10320, 'steps': 214, 'loss/train': 6.136086940765381}
07/25/2024 06:29:29 - INFO - __main__ - Step 216: {'lr': 0.0001535714285714286, 'samples': 10368, 'steps': 215, 'loss/train': 5.827876567840576}
07/25/2024 06:29:29 - INFO - __main__ - Step 217: {'lr': 0.00015428571428571428, 'samples': 10416, 'steps': 216, 'loss/train': 6.297738552093506}
07/25/2024 06:29:30 - INFO - __main__ - Step 218: {'lr': 0.000155, 'samples': 10464, 'steps': 217, 'loss/train': 5.124302387237549}
07/25/2024 06:29:30 - INFO - __main__ - Step 219: {'lr': 0.00015571428571428572, 'samples': 10512, 'steps': 218, 'loss/train': 5.82398796081543}
07/25/2024 06:29:30 - INFO - __main__ - Step 220: {'lr': 0.0001564285714285714, 'samples': 10560, 'steps': 219, 'loss/train': 5.920914649963379}
07/25/2024 06:29:31 - INFO - __main__ - Step 221: {'lr': 0.00015714285714285713, 'samples': 10608, 'steps': 220, 'loss/train': 5.506519317626953}
07/25/2024 06:29:31 - INFO - __main__ - Step 222: {'lr': 0.00015785714285714285, 'samples': 10656, 'steps': 221, 'loss/train': 5.194490432739258}
07/25/2024 06:29:31 - INFO - __main__ - Step 223: {'lr': 0.00015857142857142857, 'samples': 10704, 'steps': 222, 'loss/train': 6.241917610168457}
07/25/2024 06:29:31 - INFO - __main__ - Step 224: {'lr': 0.0001592857142857143, 'samples': 10752, 'steps': 223, 'loss/train': 5.662716388702393}
07/25/2024 06:29:32 - INFO - __main__ - Step 225: {'lr': 0.00016, 'samples': 10800, 'steps': 224, 'loss/train': 5.275988578796387}
07/25/2024 06:29:32 - INFO - __main__ - Step 226: {'lr': 0.00016071428571428573, 'samples': 10848, 'steps': 225, 'loss/train': 5.916398048400879}
07/25/2024 06:29:32 - INFO - __main__ - Step 227: {'lr': 0.00016142857142857143, 'samples': 10896, 'steps': 226, 'loss/train': 5.93534517288208}
07/25/2024 06:29:33 - INFO - __main__ - Step 228: {'lr': 0.00016214285714285715, 'samples': 10944, 'steps': 227, 'loss/train': 6.050380229949951}
07/25/2024 06:29:33 - INFO - __main__ - Step 229: {'lr': 0.00016285714285714287, 'samples': 10992, 'steps': 228, 'loss/train': 6.600334644317627}
07/25/2024 06:29:33 - INFO - __main__ - Step 230: {'lr': 0.00016357142857142856, 'samples': 11040, 'steps': 229, 'loss/train': 6.150309085845947}
07/25/2024 06:29:33 - INFO - __main__ - Step 231: {'lr': 0.00016428571428571428, 'samples': 11088, 'steps': 230, 'loss/train': 6.019353866577148}
07/25/2024 06:29:34 - INFO - __main__ - Step 232: {'lr': 0.000165, 'samples': 11136, 'steps': 231, 'loss/train': 7.122209548950195}
07/25/2024 06:29:34 - INFO - __main__ - Step 233: {'lr': 0.00016571428571428572, 'samples': 11184, 'steps': 232, 'loss/train': 5.891404151916504}
07/25/2024 06:29:34 - INFO - __main__ - Step 234: {'lr': 0.00016642857142857144, 'samples': 11232, 'steps': 233, 'loss/train': 5.697052955627441}
07/25/2024 06:29:34 - INFO - __main__ - Step 235: {'lr': 0.00016714285714285716, 'samples': 11280, 'steps': 234, 'loss/train': 5.768013954162598}
07/25/2024 06:29:35 - INFO - __main__ - Step 236: {'lr': 0.00016785714285714285, 'samples': 11328, 'steps': 235, 'loss/train': 5.943960666656494}
07/25/2024 06:29:35 - INFO - __main__ - Step 237: {'lr': 0.00016857142857142857, 'samples': 11376, 'steps': 236, 'loss/train': 7.096799850463867}
07/25/2024 06:29:35 - INFO - __main__ - Step 238: {'lr': 0.0001692857142857143, 'samples': 11424, 'steps': 237, 'loss/train': 7.258213996887207}
07/25/2024 06:29:36 - INFO - __main__ - Step 239: {'lr': 0.00017, 'samples': 11472, 'steps': 238, 'loss/train': 5.474708080291748}
07/25/2024 06:29:36 - INFO - __main__ - Step 240: {'lr': 0.0001707142857142857, 'samples': 11520, 'steps': 239, 'loss/train': 5.929581642150879}
07/25/2024 06:29:36 - INFO - __main__ - Step 241: {'lr': 0.00017142857142857143, 'samples': 11568, 'steps': 240, 'loss/train': 5.396873950958252}
07/25/2024 06:29:36 - INFO - __main__ - Step 242: {'lr': 0.00017214285714285715, 'samples': 11616, 'steps': 241, 'loss/train': 5.90254020690918}
07/25/2024 06:29:37 - INFO - __main__ - Step 243: {'lr': 0.00017285714285714287, 'samples': 11664, 'steps': 242, 'loss/train': 5.579410076141357}
07/25/2024 06:29:37 - INFO - __main__ - Step 244: {'lr': 0.00017357142857142859, 'samples': 11712, 'steps': 243, 'loss/train': 6.5500946044921875}
07/25/2024 06:29:37 - INFO - __main__ - Step 245: {'lr': 0.0001742857142857143, 'samples': 11760, 'steps': 244, 'loss/train': 6.13820219039917}
07/25/2024 06:29:38 - INFO - __main__ - Step 246: {'lr': 0.000175, 'samples': 11808, 'steps': 245, 'loss/train': 5.283195972442627}
07/25/2024 06:29:38 - INFO - __main__ - Step 247: {'lr': 0.00017571428571428572, 'samples': 11856, 'steps': 246, 'loss/train': 5.3597211837768555}
07/25/2024 06:29:38 - INFO - __main__ - Step 248: {'lr': 0.00017642857142857144, 'samples': 11904, 'steps': 247, 'loss/train': 5.715787410736084}
07/25/2024 06:29:38 - INFO - __main__ - Step 249: {'lr': 0.00017714285714285713, 'samples': 11952, 'steps': 248, 'loss/train': 5.988589286804199}
07/25/2024 06:29:39 - INFO - __main__ - Step 250: {'lr': 0.00017785714285714285, 'samples': 12000, 'steps': 249, 'loss/train': 6.131600856781006}
07/25/2024 06:29:39 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:29:39 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:29:42 - INFO - __main__ - Step 250: {'loss/eval': 5.960291385650635, 'perplexity': 387.72308349609375}
07/25/2024 06:29:43 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:29:43 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:29:43 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:30:45 - WARNING - huggingface_hub.repository - Several commits (5) will be pushed upstream.
07/25/2024 06:30:45 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:31:13 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
   d02b805..4d31a9f  celestial-aardvark-128 -> celestial-aardvark-128

07/25/2024 06:31:13 - INFO - __main__ - Step 251: {'lr': 0.00017857142857142857, 'samples': 12048, 'steps': 250, 'loss/train': 5.627201557159424}
07/25/2024 06:31:14 - INFO - __main__ - Step 252: {'lr': 0.0001792857142857143, 'samples': 12096, 'steps': 251, 'loss/train': 6.002392292022705}
07/25/2024 06:31:14 - INFO - __main__ - Step 253: {'lr': 0.00017999999999999998, 'samples': 12144, 'steps': 252, 'loss/train': 5.872100353240967}
07/25/2024 06:31:14 - INFO - __main__ - Step 254: {'lr': 0.00018071428571428573, 'samples': 12192, 'steps': 253, 'loss/train': 6.0609612464904785}
07/25/2024 06:31:14 - INFO - __main__ - Step 255: {'lr': 0.00018142857142857145, 'samples': 12240, 'steps': 254, 'loss/train': 6.275620460510254}
07/25/2024 06:31:15 - INFO - __main__ - Step 256: {'lr': 0.00018214285714285714, 'samples': 12288, 'steps': 255, 'loss/train': 6.78406286239624}
07/25/2024 06:31:15 - INFO - __main__ - Step 257: {'lr': 0.00018285714285714286, 'samples': 12336, 'steps': 256, 'loss/train': 6.069532871246338}
07/25/2024 06:31:15 - INFO - __main__ - Step 258: {'lr': 0.00018357142857142858, 'samples': 12384, 'steps': 257, 'loss/train': 5.567933559417725}
07/25/2024 06:31:16 - INFO - __main__ - Step 259: {'lr': 0.00018428571428571428, 'samples': 12432, 'steps': 258, 'loss/train': 6.152994632720947}
07/25/2024 06:31:16 - INFO - __main__ - Step 260: {'lr': 0.000185, 'samples': 12480, 'steps': 259, 'loss/train': 5.771788120269775}
07/25/2024 06:31:16 - INFO - __main__ - Step 261: {'lr': 0.00018571428571428572, 'samples': 12528, 'steps': 260, 'loss/train': 5.717995643615723}
07/25/2024 06:31:16 - INFO - __main__ - Step 262: {'lr': 0.0001864285714285714, 'samples': 12576, 'steps': 261, 'loss/train': 5.839302062988281}
07/25/2024 06:31:17 - INFO - __main__ - Step 263: {'lr': 0.00018714285714285713, 'samples': 12624, 'steps': 262, 'loss/train': 5.257016658782959}
07/25/2024 06:31:17 - INFO - __main__ - Step 264: {'lr': 0.00018785714285714288, 'samples': 12672, 'steps': 263, 'loss/train': 6.241714000701904}
07/25/2024 06:31:17 - INFO - __main__ - Step 265: {'lr': 0.0001885714285714286, 'samples': 12720, 'steps': 264, 'loss/train': 6.639944553375244}
07/25/2024 06:31:17 - INFO - __main__ - Step 266: {'lr': 0.0001892857142857143, 'samples': 12768, 'steps': 265, 'loss/train': 5.12101936340332}
07/25/2024 06:31:18 - INFO - __main__ - Step 267: {'lr': 0.00019, 'samples': 12816, 'steps': 266, 'loss/train': 5.190861701965332}
07/25/2024 06:31:18 - INFO - __main__ - Step 268: {'lr': 0.00019071428571428573, 'samples': 12864, 'steps': 267, 'loss/train': 6.486904621124268}
07/25/2024 06:31:18 - INFO - __main__ - Step 269: {'lr': 0.00019142857142857142, 'samples': 12912, 'steps': 268, 'loss/train': 5.638678073883057}
07/25/2024 06:31:19 - INFO - __main__ - Step 270: {'lr': 0.00019214285714285714, 'samples': 12960, 'steps': 269, 'loss/train': 5.088951110839844}
07/25/2024 06:31:19 - INFO - __main__ - Step 271: {'lr': 0.00019285714285714286, 'samples': 13008, 'steps': 270, 'loss/train': 5.137499809265137}
07/25/2024 06:31:19 - INFO - __main__ - Step 272: {'lr': 0.00019357142857142856, 'samples': 13056, 'steps': 271, 'loss/train': 4.604417324066162}
07/25/2024 06:31:19 - INFO - __main__ - Step 273: {'lr': 0.00019428571428571428, 'samples': 13104, 'steps': 272, 'loss/train': 5.781164646148682}
07/25/2024 06:31:20 - INFO - __main__ - Step 274: {'lr': 0.00019500000000000002, 'samples': 13152, 'steps': 273, 'loss/train': 6.4048309326171875}
07/25/2024 06:31:20 - INFO - __main__ - Step 275: {'lr': 0.00019571428571428572, 'samples': 13200, 'steps': 274, 'loss/train': 6.040492057800293}
07/25/2024 06:31:20 - INFO - __main__ - Step 276: {'lr': 0.00019642857142857144, 'samples': 13248, 'steps': 275, 'loss/train': 5.667052745819092}
07/25/2024 06:31:21 - INFO - __main__ - Step 277: {'lr': 0.00019714285714285716, 'samples': 13296, 'steps': 276, 'loss/train': 5.5247483253479}
07/25/2024 06:31:21 - INFO - __main__ - Step 278: {'lr': 0.00019785714285714288, 'samples': 13344, 'steps': 277, 'loss/train': 5.584035396575928}
07/25/2024 06:31:21 - INFO - __main__ - Step 279: {'lr': 0.00019857142857142857, 'samples': 13392, 'steps': 278, 'loss/train': 5.613864898681641}
07/25/2024 06:31:21 - INFO - __main__ - Step 280: {'lr': 0.0001992857142857143, 'samples': 13440, 'steps': 279, 'loss/train': 5.550878524780273}
07/25/2024 06:31:22 - INFO - __main__ - Step 281: {'lr': 0.0002, 'samples': 13488, 'steps': 280, 'loss/train': 6.560573101043701}
07/25/2024 06:31:22 - INFO - __main__ - Step 282: {'lr': 0.0002007142857142857, 'samples': 13536, 'steps': 281, 'loss/train': 5.38557767868042}
07/25/2024 06:31:22 - INFO - __main__ - Step 283: {'lr': 0.00020142857142857142, 'samples': 13584, 'steps': 282, 'loss/train': 6.759729862213135}
07/25/2024 06:31:23 - INFO - __main__ - Step 284: {'lr': 0.00020214285714285714, 'samples': 13632, 'steps': 283, 'loss/train': 6.179801940917969}
07/25/2024 06:31:23 - INFO - __main__ - Step 285: {'lr': 0.00020285714285714286, 'samples': 13680, 'steps': 284, 'loss/train': 5.904941082000732}
07/25/2024 06:31:23 - INFO - __main__ - Step 286: {'lr': 0.00020357142857142858, 'samples': 13728, 'steps': 285, 'loss/train': 5.76945161819458}
07/25/2024 06:31:23 - INFO - __main__ - Step 287: {'lr': 0.0002042857142857143, 'samples': 13776, 'steps': 286, 'loss/train': 8.2332124710083}
07/25/2024 06:31:24 - INFO - __main__ - Step 288: {'lr': 0.000205, 'samples': 13824, 'steps': 287, 'loss/train': 5.863339900970459}
07/25/2024 06:31:24 - INFO - __main__ - Step 289: {'lr': 0.00020571428571428572, 'samples': 13872, 'steps': 288, 'loss/train': 6.213030815124512}
07/25/2024 06:31:24 - INFO - __main__ - Step 290: {'lr': 0.00020642857142857144, 'samples': 13920, 'steps': 289, 'loss/train': 4.734172821044922}
07/25/2024 06:31:25 - INFO - __main__ - Step 291: {'lr': 0.00020714285714285716, 'samples': 13968, 'steps': 290, 'loss/train': 5.674801349639893}
07/25/2024 06:31:25 - INFO - __main__ - Step 292: {'lr': 0.00020785714285714285, 'samples': 14016, 'steps': 291, 'loss/train': 5.784888744354248}
07/25/2024 06:31:25 - INFO - __main__ - Step 293: {'lr': 0.00020857142857142857, 'samples': 14064, 'steps': 292, 'loss/train': 5.5319390296936035}
07/25/2024 06:31:25 - INFO - __main__ - Step 294: {'lr': 0.0002092857142857143, 'samples': 14112, 'steps': 293, 'loss/train': 5.685769557952881}
07/25/2024 06:31:26 - INFO - __main__ - Step 295: {'lr': 0.00021, 'samples': 14160, 'steps': 294, 'loss/train': 5.418774604797363}
07/25/2024 06:31:26 - INFO - __main__ - Step 296: {'lr': 0.00021071428571428573, 'samples': 14208, 'steps': 295, 'loss/train': 4.068847179412842}
07/25/2024 06:31:26 - INFO - __main__ - Step 297: {'lr': 0.00021142857142857145, 'samples': 14256, 'steps': 296, 'loss/train': 5.367792129516602}
07/25/2024 06:31:26 - INFO - __main__ - Step 298: {'lr': 0.00021214285714285714, 'samples': 14304, 'steps': 297, 'loss/train': 5.713776588439941}
07/25/2024 06:31:27 - INFO - __main__ - Step 299: {'lr': 0.00021285714285714286, 'samples': 14352, 'steps': 298, 'loss/train': 5.603511810302734}
07/25/2024 06:31:27 - INFO - __main__ - Step 300: {'lr': 0.00021357142857142858, 'samples': 14400, 'steps': 299, 'loss/train': 6.163950443267822}
07/25/2024 06:31:27 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:31:27 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:31:31 - INFO - __main__ - Step 300: {'loss/eval': 5.79922342300415, 'perplexity': 330.0431823730469}
07/25/2024 06:31:31 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:31:31 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:31:32 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:32:35 - WARNING - huggingface_hub.repository - Several commits (6) will be pushed upstream.
07/25/2024 06:32:35 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:33:00 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
   4d31a9f..aae7e8d  celestial-aardvark-128 -> celestial-aardvark-128

07/25/2024 06:33:00 - INFO - __main__ - Step 301: {'lr': 0.00021428571428571427, 'samples': 14448, 'steps': 300, 'loss/train': 5.406757354736328}
07/25/2024 06:33:00 - INFO - __main__ - Step 302: {'lr': 0.000215, 'samples': 14496, 'steps': 301, 'loss/train': 5.90996789932251}
07/25/2024 06:33:01 - INFO - __main__ - Step 303: {'lr': 0.00021571428571428571, 'samples': 14544, 'steps': 302, 'loss/train': 6.092479228973389}
07/25/2024 06:33:01 - INFO - __main__ - Step 304: {'lr': 0.00021642857142857143, 'samples': 14592, 'steps': 303, 'loss/train': 5.216100215911865}
07/25/2024 06:33:01 - INFO - __main__ - Step 305: {'lr': 0.00021714285714285715, 'samples': 14640, 'steps': 304, 'loss/train': 5.621682643890381}
07/25/2024 06:33:01 - INFO - __main__ - Step 306: {'lr': 0.00021785714285714287, 'samples': 14688, 'steps': 305, 'loss/train': 5.823093414306641}
07/25/2024 06:33:02 - INFO - __main__ - Step 307: {'lr': 0.0002185714285714286, 'samples': 14736, 'steps': 306, 'loss/train': 6.228525161743164}
07/25/2024 06:33:02 - INFO - __main__ - Step 308: {'lr': 0.0002192857142857143, 'samples': 14784, 'steps': 307, 'loss/train': 5.9510087966918945}
07/25/2024 06:33:02 - INFO - __main__ - Step 309: {'lr': 0.00022, 'samples': 14832, 'steps': 308, 'loss/train': 5.266091346740723}
07/25/2024 06:33:03 - INFO - __main__ - Step 310: {'lr': 0.00022071428571428573, 'samples': 14880, 'steps': 309, 'loss/train': 5.217267036437988}
07/25/2024 06:33:03 - INFO - __main__ - Step 311: {'lr': 0.00022142857142857142, 'samples': 14928, 'steps': 310, 'loss/train': 7.697060585021973}
07/25/2024 06:33:03 - INFO - __main__ - Step 312: {'lr': 0.00022214285714285714, 'samples': 14976, 'steps': 311, 'loss/train': 5.666650772094727}
07/25/2024 06:33:03 - INFO - __main__ - Step 313: {'lr': 0.00022285714285714286, 'samples': 15024, 'steps': 312, 'loss/train': 6.425085067749023}
07/25/2024 06:33:04 - INFO - __main__ - Step 314: {'lr': 0.00022357142857142855, 'samples': 15072, 'steps': 313, 'loss/train': 4.396389007568359}
07/25/2024 06:33:04 - INFO - __main__ - Step 315: {'lr': 0.0002242857142857143, 'samples': 15120, 'steps': 314, 'loss/train': 5.2941131591796875}
07/25/2024 06:33:04 - INFO - __main__ - Step 316: {'lr': 0.00022500000000000002, 'samples': 15168, 'steps': 315, 'loss/train': 5.752312183380127}
07/25/2024 06:33:04 - INFO - __main__ - Step 317: {'lr': 0.00022571428571428571, 'samples': 15216, 'steps': 316, 'loss/train': 6.089960098266602}
07/25/2024 06:33:05 - INFO - __main__ - Step 318: {'lr': 0.00022642857142857143, 'samples': 15264, 'steps': 317, 'loss/train': 5.828670978546143}
07/25/2024 06:33:05 - INFO - __main__ - Step 319: {'lr': 0.00022714285714285715, 'samples': 15312, 'steps': 318, 'loss/train': 5.34361457824707}
07/25/2024 06:33:05 - INFO - __main__ - Step 320: {'lr': 0.00022785714285714287, 'samples': 15360, 'steps': 319, 'loss/train': 3.9433271884918213}
07/25/2024 06:33:06 - INFO - __main__ - Step 321: {'lr': 0.00022857142857142857, 'samples': 15408, 'steps': 320, 'loss/train': 5.489405632019043}
07/25/2024 06:33:06 - INFO - __main__ - Step 322: {'lr': 0.0002292857142857143, 'samples': 15456, 'steps': 321, 'loss/train': 5.065426826477051}
07/25/2024 06:33:06 - INFO - __main__ - Step 323: {'lr': 0.00023, 'samples': 15504, 'steps': 322, 'loss/train': 4.657402038574219}
07/25/2024 06:33:06 - INFO - __main__ - Step 324: {'lr': 0.0002307142857142857, 'samples': 15552, 'steps': 323, 'loss/train': 6.042489528656006}
07/25/2024 06:33:07 - INFO - __main__ - Step 325: {'lr': 0.00023142857142857142, 'samples': 15600, 'steps': 324, 'loss/train': 5.562082290649414}
07/25/2024 06:33:07 - INFO - __main__ - Step 326: {'lr': 0.00023214285714285717, 'samples': 15648, 'steps': 325, 'loss/train': 5.726541519165039}
07/25/2024 06:33:07 - INFO - __main__ - Step 327: {'lr': 0.00023285714285714286, 'samples': 15696, 'steps': 326, 'loss/train': 5.573945045471191}
07/25/2024 06:33:08 - INFO - __main__ - Step 328: {'lr': 0.00023357142857142858, 'samples': 15744, 'steps': 327, 'loss/train': 6.105917930603027}
07/25/2024 06:33:08 - INFO - __main__ - Step 329: {'lr': 0.0002342857142857143, 'samples': 15792, 'steps': 328, 'loss/train': 5.546865463256836}
07/25/2024 06:33:08 - INFO - __main__ - Step 330: {'lr': 0.000235, 'samples': 15840, 'steps': 329, 'loss/train': 5.543821334838867}
07/25/2024 06:33:08 - INFO - __main__ - Step 331: {'lr': 0.0002357142857142857, 'samples': 15888, 'steps': 330, 'loss/train': 5.6774582862854}
07/25/2024 06:33:09 - INFO - __main__ - Step 332: {'lr': 0.00023642857142857143, 'samples': 15936, 'steps': 331, 'loss/train': 5.767722129821777}
07/25/2024 06:33:09 - INFO - __main__ - Step 333: {'lr': 0.00023714285714285715, 'samples': 15984, 'steps': 332, 'loss/train': 5.70899772644043}
07/25/2024 06:33:09 - INFO - __main__ - Step 334: {'lr': 0.00023785714285714285, 'samples': 16032, 'steps': 333, 'loss/train': 5.67036247253418}
07/25/2024 06:33:10 - INFO - __main__ - Step 335: {'lr': 0.00023857142857142857, 'samples': 16080, 'steps': 334, 'loss/train': 5.325812339782715}
07/25/2024 06:33:10 - INFO - __main__ - Step 336: {'lr': 0.0002392857142857143, 'samples': 16128, 'steps': 335, 'loss/train': 5.349172592163086}
07/25/2024 06:33:10 - INFO - __main__ - Step 337: {'lr': 0.00024, 'samples': 16176, 'steps': 336, 'loss/train': 5.448930263519287}
07/25/2024 06:33:10 - INFO - __main__ - Step 338: {'lr': 0.00024071428571428573, 'samples': 16224, 'steps': 337, 'loss/train': 3.7934205532073975}
07/25/2024 06:33:11 - INFO - __main__ - Step 339: {'lr': 0.00024142857142857145, 'samples': 16272, 'steps': 338, 'loss/train': 5.1056013107299805}
07/25/2024 06:33:11 - INFO - __main__ - Step 340: {'lr': 0.00024214285714285714, 'samples': 16320, 'steps': 339, 'loss/train': 5.9682464599609375}
07/25/2024 06:33:11 - INFO - __main__ - Step 341: {'lr': 0.00024285714285714286, 'samples': 16368, 'steps': 340, 'loss/train': 5.546884536743164}
07/25/2024 06:33:12 - INFO - __main__ - Step 342: {'lr': 0.00024357142857142858, 'samples': 16416, 'steps': 341, 'loss/train': 6.586970329284668}
07/25/2024 06:33:12 - INFO - __main__ - Step 343: {'lr': 0.0002442857142857143, 'samples': 16464, 'steps': 342, 'loss/train': 5.654937744140625}
07/25/2024 06:33:12 - INFO - __main__ - Step 344: {'lr': 0.000245, 'samples': 16512, 'steps': 343, 'loss/train': 3.9033658504486084}
07/25/2024 06:33:12 - INFO - __main__ - Step 345: {'lr': 0.00024571428571428574, 'samples': 16560, 'steps': 344, 'loss/train': 6.266292095184326}
07/25/2024 06:33:13 - INFO - __main__ - Step 346: {'lr': 0.00024642857142857143, 'samples': 16608, 'steps': 345, 'loss/train': 5.5901007652282715}
07/25/2024 06:33:13 - INFO - __main__ - Step 347: {'lr': 0.0002471428571428571, 'samples': 16656, 'steps': 346, 'loss/train': 5.836148738861084}
07/25/2024 06:33:13 - INFO - __main__ - Step 348: {'lr': 0.00024785714285714287, 'samples': 16704, 'steps': 347, 'loss/train': 5.447431564331055}
07/25/2024 06:33:13 - INFO - __main__ - Step 349: {'lr': 0.00024857142857142857, 'samples': 16752, 'steps': 348, 'loss/train': 5.124023914337158}
07/25/2024 06:33:14 - INFO - __main__ - Step 350: {'lr': 0.00024928571428571426, 'samples': 16800, 'steps': 349, 'loss/train': 5.541380405426025}
07/25/2024 06:33:14 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:33:14 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:33:17 - INFO - __main__ - Step 350: {'loss/eval': 5.6890645027160645, 'perplexity': 295.616943359375}
07/25/2024 06:33:18 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:33:18 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:33:18 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl