File size: 108,031 Bytes
304deea 1e419a3 dcc8019 d02b805 4d31a9f aae7e8d 8e55297 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 |
07/25/2024 06:16:39 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0
Mixed precision type: fp16
07/25/2024 06:16:39 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`.
07/25/2024 06:16:40 - WARNING - huggingface_hub.repository - Revision `hopeful-snow-127` does not exist. Created and checked out branch `hopeful-snow-127`.
07/25/2024 06:16:40 - WARNING - huggingface_hub.repository -
07/25/2024 06:16:41 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet.
07/25/2024 06:16:41 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default
07/25/2024 06:16:41 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:47 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:49 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:16:49 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:16:54 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988}
07/25/2024 06:16:55 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105}
07/25/2024 06:22:39 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0
Mixed precision type: fp16
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - /dli/gptesla-small/./ is already a clone of https://huggingface.co/shng2025/gptesla-small. Make sure you pull the latest changes with `repo.git_pull()`.
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository - Revision `celestial-aardvark-128` does not exist. Created and checked out branch `celestial-aardvark-128`.
07/25/2024 06:22:39 - WARNING - huggingface_hub.repository -
07/25/2024 06:22:41 - DEBUG - datasets.utils._dataset_viewer - Dataset info for shng2025/gptesla-train is not completely ready yet.
07/25/2024 06:22:41 - INFO - datasets.builder - No config specified, defaulting to the single config: gptesla-train/default
07/25/2024 06:22:41 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#1, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#4, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#5, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#2, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#3, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#6, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#7, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#8, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#9, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#10, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#12, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#15, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#14, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#16, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#13, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#11, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#17, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#18, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#19, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#20, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#22, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#23, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#24, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#25, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#21, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#28, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#27, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#26, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#29, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#30, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#31, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#32, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#33, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#34, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#35, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#36, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#37, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#38, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#39, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#40, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#41, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#42, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#43, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#44, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#45, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#46, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#47, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#48, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#49, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#50, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#52, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#53, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#54, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#51, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#55, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#56, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#57, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#58, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#59, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#60, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#61, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#62, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#63, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#64, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#65, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#67, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#66, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#68, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#69, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#70, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#72, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#73, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#74, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#75, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#76, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#71, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#77, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#78, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#79, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#80, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#81, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#82, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#83, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#84, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#85, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#86, ': Starting to iterate over 2/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#87, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#88, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#89, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#90, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#92, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#91, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#93, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#94, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.iterable_dataset - dataloader worker#95, ': Starting to iterate over 1/183 shards.
07/25/2024 06:22:46 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500930 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492277 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525688 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489635 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486023 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10668116 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10522596 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10512203 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492861 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497218 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485912 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486397 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10536479 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10863935 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491327 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10562022 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485842 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497111 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489599 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10493913 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10949076 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10598254 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10553677 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10515063 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509286 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487790 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485847 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488385 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10610581 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495973 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497062 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488098 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511500 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10525926 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488150 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486801 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488651 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499106 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10552417 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486616 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491272 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511604 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491889 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487725 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486276 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11286262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10488608 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10501535 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10497335 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10509262 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10489575 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10485918 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10491547 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487097 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10495520 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10751338 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10621496 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10498167 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10486172 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10686322 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10499607 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10511515 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 11115863 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10530453 bytes couldn't be parsed with block_size=655360. Retrying with block_size=1310720.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10492554 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10640425 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10487482 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:47 - DEBUG - datasets.packaged_modules.json.json - Batch of 10500290 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:48 - DEBUG - datasets.packaged_modules.json.json - Batch of 10676628 bytes couldn't be parsed with block_size=327680. Retrying with block_size=655360.
07/25/2024 06:22:59 - INFO - __main__ - Step 1: {'lr': 0.0, 'samples': 48, 'steps': 0, 'loss/train': 10.554669380187988}
07/25/2024 06:23:02 - INFO - __main__ - Step 2: {'lr': 7.142857142857143e-07, 'samples': 96, 'steps': 1, 'loss/train': 10.494059562683105}
07/25/2024 06:23:02 - INFO - __main__ - Step 3: {'lr': 1.4285714285714286e-06, 'samples': 144, 'steps': 2, 'loss/train': 10.507988929748535}
07/25/2024 06:23:03 - INFO - __main__ - Step 4: {'lr': 2.142857142857143e-06, 'samples': 192, 'steps': 3, 'loss/train': 10.415447235107422}
07/25/2024 06:23:03 - INFO - __main__ - Step 5: {'lr': 2.8571428571428573e-06, 'samples': 240, 'steps': 4, 'loss/train': 10.345850944519043}
07/25/2024 06:23:03 - INFO - __main__ - Step 6: {'lr': 3.5714285714285714e-06, 'samples': 288, 'steps': 5, 'loss/train': 10.195524215698242}
07/25/2024 06:23:03 - INFO - __main__ - Step 7: {'lr': 4.285714285714286e-06, 'samples': 336, 'steps': 6, 'loss/train': 10.09341812133789}
07/25/2024 06:23:04 - INFO - __main__ - Step 8: {'lr': 5e-06, 'samples': 384, 'steps': 7, 'loss/train': 9.965239524841309}
07/25/2024 06:23:04 - INFO - __main__ - Step 9: {'lr': 5.7142857142857145e-06, 'samples': 432, 'steps': 8, 'loss/train': 9.698853492736816}
07/25/2024 06:23:04 - INFO - __main__ - Step 10: {'lr': 6.428571428571429e-06, 'samples': 480, 'steps': 9, 'loss/train': 9.80683708190918}
07/25/2024 06:23:05 - INFO - __main__ - Step 11: {'lr': 7.142857142857143e-06, 'samples': 528, 'steps': 10, 'loss/train': 9.633079528808594}
07/25/2024 06:23:05 - INFO - __main__ - Step 12: {'lr': 7.857142857142858e-06, 'samples': 576, 'steps': 11, 'loss/train': 9.700591087341309}
07/25/2024 06:23:05 - INFO - __main__ - Step 13: {'lr': 8.571428571428573e-06, 'samples': 624, 'steps': 12, 'loss/train': 9.603139877319336}
07/25/2024 06:23:05 - INFO - __main__ - Step 14: {'lr': 9.285714285714286e-06, 'samples': 672, 'steps': 13, 'loss/train': 9.30308723449707}
07/25/2024 06:23:06 - INFO - __main__ - Step 15: {'lr': 1e-05, 'samples': 720, 'steps': 14, 'loss/train': 9.333526611328125}
07/25/2024 06:23:06 - INFO - __main__ - Step 16: {'lr': 1.0714285714285714e-05, 'samples': 768, 'steps': 15, 'loss/train': 8.336181640625}
07/25/2024 06:23:06 - INFO - __main__ - Step 17: {'lr': 1.1428571428571429e-05, 'samples': 816, 'steps': 16, 'loss/train': 9.075631141662598}
07/25/2024 06:23:07 - INFO - __main__ - Step 18: {'lr': 1.2142857142857142e-05, 'samples': 864, 'steps': 17, 'loss/train': 9.18478012084961}
07/25/2024 06:23:07 - INFO - __main__ - Step 19: {'lr': 1.2857142857142857e-05, 'samples': 912, 'steps': 18, 'loss/train': 8.96328353881836}
07/25/2024 06:23:07 - INFO - __main__ - Step 20: {'lr': 1.3571428571428572e-05, 'samples': 960, 'steps': 19, 'loss/train': 9.45018196105957}
07/25/2024 06:23:07 - INFO - __main__ - Step 21: {'lr': 1.4285714285714285e-05, 'samples': 1008, 'steps': 20, 'loss/train': 8.517333984375}
07/25/2024 06:23:08 - INFO - __main__ - Step 22: {'lr': 1.5e-05, 'samples': 1056, 'steps': 21, 'loss/train': 9.207684516906738}
07/25/2024 06:23:08 - INFO - __main__ - Step 23: {'lr': 1.5714285714285715e-05, 'samples': 1104, 'steps': 22, 'loss/train': 8.681092262268066}
07/25/2024 06:23:08 - INFO - __main__ - Step 24: {'lr': 1.642857142857143e-05, 'samples': 1152, 'steps': 23, 'loss/train': 8.316036224365234}
07/25/2024 06:23:09 - INFO - __main__ - Step 25: {'lr': 1.7142857142857145e-05, 'samples': 1200, 'steps': 24, 'loss/train': 8.944169044494629}
07/25/2024 06:23:09 - INFO - __main__ - Step 26: {'lr': 1.7857142857142855e-05, 'samples': 1248, 'steps': 25, 'loss/train': 8.878201484680176}
07/25/2024 06:23:09 - INFO - __main__ - Step 27: {'lr': 1.8571428571428572e-05, 'samples': 1296, 'steps': 26, 'loss/train': 9.158102989196777}
07/25/2024 06:23:09 - INFO - __main__ - Step 28: {'lr': 1.9285714285714285e-05, 'samples': 1344, 'steps': 27, 'loss/train': 9.14354419708252}
07/25/2024 06:23:10 - INFO - __main__ - Step 29: {'lr': 2e-05, 'samples': 1392, 'steps': 28, 'loss/train': 8.860624313354492}
07/25/2024 06:23:10 - INFO - __main__ - Step 30: {'lr': 2.0714285714285715e-05, 'samples': 1440, 'steps': 29, 'loss/train': 8.876450538635254}
07/25/2024 06:23:10 - INFO - __main__ - Step 31: {'lr': 2.1428571428571428e-05, 'samples': 1488, 'steps': 30, 'loss/train': 8.425738334655762}
07/25/2024 06:23:10 - INFO - __main__ - Step 32: {'lr': 2.214285714285714e-05, 'samples': 1536, 'steps': 31, 'loss/train': 8.942279815673828}
07/25/2024 06:23:11 - INFO - __main__ - Step 33: {'lr': 2.2857142857142858e-05, 'samples': 1584, 'steps': 32, 'loss/train': 8.757084846496582}
07/25/2024 06:23:11 - INFO - __main__ - Step 34: {'lr': 2.3571428571428575e-05, 'samples': 1632, 'steps': 33, 'loss/train': 8.699286460876465}
07/25/2024 06:23:11 - INFO - __main__ - Step 35: {'lr': 2.4285714285714285e-05, 'samples': 1680, 'steps': 34, 'loss/train': 8.857367515563965}
07/25/2024 06:23:12 - INFO - __main__ - Step 36: {'lr': 2.5e-05, 'samples': 1728, 'steps': 35, 'loss/train': 8.830195426940918}
07/25/2024 06:23:12 - INFO - __main__ - Step 37: {'lr': 2.5714285714285714e-05, 'samples': 1776, 'steps': 36, 'loss/train': 8.944982528686523}
07/25/2024 06:23:12 - INFO - __main__ - Step 38: {'lr': 2.642857142857143e-05, 'samples': 1824, 'steps': 37, 'loss/train': 8.670278549194336}
07/25/2024 06:23:12 - INFO - __main__ - Step 39: {'lr': 2.7142857142857144e-05, 'samples': 1872, 'steps': 38, 'loss/train': 8.710525512695312}
07/25/2024 06:23:13 - INFO - __main__ - Step 40: {'lr': 2.7857142857142858e-05, 'samples': 1920, 'steps': 39, 'loss/train': 7.902089595794678}
07/25/2024 06:23:13 - INFO - __main__ - Step 41: {'lr': 2.857142857142857e-05, 'samples': 1968, 'steps': 40, 'loss/train': 8.400484085083008}
07/25/2024 06:23:13 - INFO - __main__ - Step 42: {'lr': 2.9285714285714288e-05, 'samples': 2016, 'steps': 41, 'loss/train': 8.789310455322266}
07/25/2024 06:23:14 - INFO - __main__ - Step 43: {'lr': 3e-05, 'samples': 2064, 'steps': 42, 'loss/train': 8.754344940185547}
07/25/2024 06:23:14 - INFO - __main__ - Step 44: {'lr': 3.071428571428572e-05, 'samples': 2112, 'steps': 43, 'loss/train': 8.84192943572998}
07/25/2024 06:23:14 - INFO - __main__ - Step 45: {'lr': 3.142857142857143e-05, 'samples': 2160, 'steps': 44, 'loss/train': 8.784793853759766}
07/25/2024 06:23:14 - INFO - __main__ - Step 46: {'lr': 3.214285714285714e-05, 'samples': 2208, 'steps': 45, 'loss/train': 8.67403793334961}
07/25/2024 06:23:15 - INFO - __main__ - Step 47: {'lr': 3.285714285714286e-05, 'samples': 2256, 'steps': 46, 'loss/train': 8.51427173614502}
07/25/2024 06:23:15 - INFO - __main__ - Step 48: {'lr': 3.357142857142857e-05, 'samples': 2304, 'steps': 47, 'loss/train': 8.48193073272705}
07/25/2024 06:23:15 - INFO - __main__ - Step 49: {'lr': 3.428571428571429e-05, 'samples': 2352, 'steps': 48, 'loss/train': 8.518038749694824}
07/25/2024 06:23:15 - INFO - __main__ - Step 50: {'lr': 3.5000000000000004e-05, 'samples': 2400, 'steps': 49, 'loss/train': 8.63569450378418}
07/25/2024 06:23:16 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:23:16 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:23:19 - INFO - __main__ - Step 50: {'loss/eval': 8.551246643066406, 'perplexity': 5173.19970703125}
07/25/2024 06:23:20 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:23:20 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:23:20 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:23:21 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:24:11 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
4d63b0c..304deea celestial-aardvark-128 -> celestial-aardvark-128
07/25/2024 06:24:11 - INFO - __main__ - Step 51: {'lr': 3.571428571428571e-05, 'samples': 2448, 'steps': 50, 'loss/train': 8.343396186828613}
07/25/2024 06:24:11 - INFO - __main__ - Step 52: {'lr': 3.642857142857143e-05, 'samples': 2496, 'steps': 51, 'loss/train': 8.461634635925293}
07/25/2024 06:24:12 - INFO - __main__ - Step 53: {'lr': 3.7142857142857143e-05, 'samples': 2544, 'steps': 52, 'loss/train': 8.43316650390625}
07/25/2024 06:24:12 - INFO - __main__ - Step 54: {'lr': 3.7857142857142864e-05, 'samples': 2592, 'steps': 53, 'loss/train': 8.464268684387207}
07/25/2024 06:24:12 - INFO - __main__ - Step 55: {'lr': 3.857142857142857e-05, 'samples': 2640, 'steps': 54, 'loss/train': 8.371450424194336}
07/25/2024 06:24:12 - INFO - __main__ - Step 56: {'lr': 3.928571428571428e-05, 'samples': 2688, 'steps': 55, 'loss/train': 8.155680656433105}
07/25/2024 06:24:13 - INFO - __main__ - Step 57: {'lr': 4e-05, 'samples': 2736, 'steps': 56, 'loss/train': 8.359997749328613}
07/25/2024 06:24:13 - INFO - __main__ - Step 58: {'lr': 4.0714285714285717e-05, 'samples': 2784, 'steps': 57, 'loss/train': 7.883953094482422}
07/25/2024 06:24:13 - INFO - __main__ - Step 59: {'lr': 4.142857142857143e-05, 'samples': 2832, 'steps': 58, 'loss/train': 8.425983428955078}
07/25/2024 06:24:14 - INFO - __main__ - Step 60: {'lr': 4.214285714285714e-05, 'samples': 2880, 'steps': 59, 'loss/train': 8.220914840698242}
07/25/2024 06:24:14 - INFO - __main__ - Step 61: {'lr': 4.2857142857142856e-05, 'samples': 2928, 'steps': 60, 'loss/train': 8.216103553771973}
07/25/2024 06:24:14 - INFO - __main__ - Step 62: {'lr': 4.3571428571428576e-05, 'samples': 2976, 'steps': 61, 'loss/train': 8.129951477050781}
07/25/2024 06:24:14 - INFO - __main__ - Step 63: {'lr': 4.428571428571428e-05, 'samples': 3024, 'steps': 62, 'loss/train': 7.993805885314941}
07/25/2024 06:24:15 - INFO - __main__ - Step 64: {'lr': 4.4999999999999996e-05, 'samples': 3072, 'steps': 63, 'loss/train': 6.955376625061035}
07/25/2024 06:24:15 - INFO - __main__ - Step 65: {'lr': 4.5714285714285716e-05, 'samples': 3120, 'steps': 64, 'loss/train': 7.9038238525390625}
07/25/2024 06:24:15 - INFO - __main__ - Step 66: {'lr': 4.642857142857143e-05, 'samples': 3168, 'steps': 65, 'loss/train': 7.659880638122559}
07/25/2024 06:24:16 - INFO - __main__ - Step 67: {'lr': 4.714285714285715e-05, 'samples': 3216, 'steps': 66, 'loss/train': 7.462357521057129}
07/25/2024 06:24:16 - INFO - __main__ - Step 68: {'lr': 4.7857142857142856e-05, 'samples': 3264, 'steps': 67, 'loss/train': 7.9803571701049805}
07/25/2024 06:24:16 - INFO - __main__ - Step 69: {'lr': 4.857142857142857e-05, 'samples': 3312, 'steps': 68, 'loss/train': 7.895639896392822}
07/25/2024 06:24:16 - INFO - __main__ - Step 70: {'lr': 4.928571428571429e-05, 'samples': 3360, 'steps': 69, 'loss/train': 7.726537704467773}
07/25/2024 06:24:17 - INFO - __main__ - Step 71: {'lr': 5e-05, 'samples': 3408, 'steps': 70, 'loss/train': 7.8505425453186035}
07/25/2024 06:24:17 - INFO - __main__ - Step 72: {'lr': 5.0714285714285716e-05, 'samples': 3456, 'steps': 71, 'loss/train': 7.492800235748291}
07/25/2024 06:24:17 - INFO - __main__ - Step 73: {'lr': 5.142857142857143e-05, 'samples': 3504, 'steps': 72, 'loss/train': 7.890054225921631}
07/25/2024 06:24:18 - INFO - __main__ - Step 74: {'lr': 5.214285714285714e-05, 'samples': 3552, 'steps': 73, 'loss/train': 7.429488182067871}
07/25/2024 06:24:18 - INFO - __main__ - Step 75: {'lr': 5.285714285714286e-05, 'samples': 3600, 'steps': 74, 'loss/train': 7.520913600921631}
07/25/2024 06:24:18 - INFO - __main__ - Step 76: {'lr': 5.357142857142857e-05, 'samples': 3648, 'steps': 75, 'loss/train': 7.66839075088501}
07/25/2024 06:24:18 - INFO - __main__ - Step 77: {'lr': 5.428571428571429e-05, 'samples': 3696, 'steps': 76, 'loss/train': 7.810487270355225}
07/25/2024 06:24:19 - INFO - __main__ - Step 78: {'lr': 5.5e-05, 'samples': 3744, 'steps': 77, 'loss/train': 7.009271621704102}
07/25/2024 06:24:19 - INFO - __main__ - Step 79: {'lr': 5.5714285714285715e-05, 'samples': 3792, 'steps': 78, 'loss/train': 7.631109714508057}
07/25/2024 06:24:19 - INFO - __main__ - Step 80: {'lr': 5.642857142857143e-05, 'samples': 3840, 'steps': 79, 'loss/train': 6.9839606285095215}
07/25/2024 06:24:20 - INFO - __main__ - Step 81: {'lr': 5.714285714285714e-05, 'samples': 3888, 'steps': 80, 'loss/train': 7.642471790313721}
07/25/2024 06:24:20 - INFO - __main__ - Step 82: {'lr': 5.7857142857142855e-05, 'samples': 3936, 'steps': 81, 'loss/train': 7.183259010314941}
07/25/2024 06:24:20 - INFO - __main__ - Step 83: {'lr': 5.8571428571428575e-05, 'samples': 3984, 'steps': 82, 'loss/train': 7.3919596672058105}
07/25/2024 06:24:20 - INFO - __main__ - Step 84: {'lr': 5.928571428571429e-05, 'samples': 4032, 'steps': 83, 'loss/train': 7.52573299407959}
07/25/2024 06:24:21 - INFO - __main__ - Step 85: {'lr': 6e-05, 'samples': 4080, 'steps': 84, 'loss/train': 7.169320583343506}
07/25/2024 06:24:21 - INFO - __main__ - Step 86: {'lr': 6.0714285714285715e-05, 'samples': 4128, 'steps': 85, 'loss/train': 7.095631122589111}
07/25/2024 06:24:21 - INFO - __main__ - Step 87: {'lr': 6.142857142857143e-05, 'samples': 4176, 'steps': 86, 'loss/train': 7.257204532623291}
07/25/2024 06:24:21 - INFO - __main__ - Step 88: {'lr': 6.214285714285714e-05, 'samples': 4224, 'steps': 87, 'loss/train': 6.010106563568115}
07/25/2024 06:24:22 - INFO - __main__ - Step 89: {'lr': 6.285714285714286e-05, 'samples': 4272, 'steps': 88, 'loss/train': 7.189196586608887}
07/25/2024 06:24:22 - INFO - __main__ - Step 90: {'lr': 6.357142857142857e-05, 'samples': 4320, 'steps': 89, 'loss/train': 6.902089595794678}
07/25/2024 06:24:22 - INFO - __main__ - Step 91: {'lr': 6.428571428571427e-05, 'samples': 4368, 'steps': 90, 'loss/train': 6.5942535400390625}
07/25/2024 06:24:23 - INFO - __main__ - Step 92: {'lr': 6.500000000000001e-05, 'samples': 4416, 'steps': 91, 'loss/train': 7.392148017883301}
07/25/2024 06:24:23 - INFO - __main__ - Step 93: {'lr': 6.571428571428571e-05, 'samples': 4464, 'steps': 92, 'loss/train': 6.586553573608398}
07/25/2024 06:24:23 - INFO - __main__ - Step 94: {'lr': 6.642857142857143e-05, 'samples': 4512, 'steps': 93, 'loss/train': 7.5296549797058105}
07/25/2024 06:24:23 - INFO - __main__ - Step 95: {'lr': 6.714285714285714e-05, 'samples': 4560, 'steps': 94, 'loss/train': 7.048985481262207}
07/25/2024 06:24:24 - INFO - __main__ - Step 96: {'lr': 6.785714285714285e-05, 'samples': 4608, 'steps': 95, 'loss/train': 4.687469959259033}
07/25/2024 06:24:24 - INFO - __main__ - Step 97: {'lr': 6.857142857142858e-05, 'samples': 4656, 'steps': 96, 'loss/train': 7.1623854637146}
07/25/2024 06:24:24 - INFO - __main__ - Step 98: {'lr': 6.928571428571429e-05, 'samples': 4704, 'steps': 97, 'loss/train': 6.722190856933594}
07/25/2024 06:24:25 - INFO - __main__ - Step 99: {'lr': 7.000000000000001e-05, 'samples': 4752, 'steps': 98, 'loss/train': 6.930887699127197}
07/25/2024 06:24:25 - INFO - __main__ - Step 100: {'lr': 7.071428571428571e-05, 'samples': 4800, 'steps': 99, 'loss/train': 7.2268805503845215}
07/25/2024 06:24:25 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:24:25 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:24:28 - INFO - __main__ - Step 100: {'loss/eval': 7.000552177429199, 'perplexity': 1097.2388916015625}
07/25/2024 06:24:29 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:24:29 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:24:29 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:24:31 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:25:31 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream.
07/25/2024 06:25:31 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:25:53 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
304deea..1e419a3 celestial-aardvark-128 -> celestial-aardvark-128
07/25/2024 06:25:54 - INFO - __main__ - Step 101: {'lr': 7.142857142857142e-05, 'samples': 4848, 'steps': 100, 'loss/train': 6.872276306152344}
07/25/2024 06:25:55 - INFO - __main__ - Step 102: {'lr': 7.214285714285715e-05, 'samples': 4896, 'steps': 101, 'loss/train': 5.04807710647583}
07/25/2024 06:25:55 - INFO - __main__ - Step 103: {'lr': 7.285714285714286e-05, 'samples': 4944, 'steps': 102, 'loss/train': 6.8386383056640625}
07/25/2024 06:25:55 - INFO - __main__ - Step 104: {'lr': 7.357142857142857e-05, 'samples': 4992, 'steps': 103, 'loss/train': 6.707127571105957}
07/25/2024 06:25:56 - INFO - __main__ - Step 105: {'lr': 7.428571428571429e-05, 'samples': 5040, 'steps': 104, 'loss/train': 6.885215759277344}
07/25/2024 06:25:56 - INFO - __main__ - Step 106: {'lr': 7.5e-05, 'samples': 5088, 'steps': 105, 'loss/train': 6.762844562530518}
07/25/2024 06:25:56 - INFO - __main__ - Step 107: {'lr': 7.571428571428573e-05, 'samples': 5136, 'steps': 106, 'loss/train': 6.92085599899292}
07/25/2024 06:25:56 - INFO - __main__ - Step 108: {'lr': 7.642857142857143e-05, 'samples': 5184, 'steps': 107, 'loss/train': 6.639281749725342}
07/25/2024 06:25:57 - INFO - __main__ - Step 109: {'lr': 7.714285714285714e-05, 'samples': 5232, 'steps': 108, 'loss/train': 6.710461616516113}
07/25/2024 06:25:57 - INFO - __main__ - Step 110: {'lr': 7.785714285714286e-05, 'samples': 5280, 'steps': 109, 'loss/train': 3.4145185947418213}
07/25/2024 06:25:57 - INFO - __main__ - Step 111: {'lr': 7.857142857142857e-05, 'samples': 5328, 'steps': 110, 'loss/train': 6.69966983795166}
07/25/2024 06:25:58 - INFO - __main__ - Step 112: {'lr': 7.928571428571429e-05, 'samples': 5376, 'steps': 111, 'loss/train': 6.780115127563477}
07/25/2024 06:25:58 - INFO - __main__ - Step 113: {'lr': 8e-05, 'samples': 5424, 'steps': 112, 'loss/train': 6.512848377227783}
07/25/2024 06:25:58 - INFO - __main__ - Step 114: {'lr': 8.071428571428571e-05, 'samples': 5472, 'steps': 113, 'loss/train': 6.558418273925781}
07/25/2024 06:25:58 - INFO - __main__ - Step 115: {'lr': 8.142857142857143e-05, 'samples': 5520, 'steps': 114, 'loss/train': 6.531116485595703}
07/25/2024 06:25:59 - INFO - __main__ - Step 116: {'lr': 8.214285714285714e-05, 'samples': 5568, 'steps': 115, 'loss/train': 6.557308197021484}
07/25/2024 06:25:59 - INFO - __main__ - Step 117: {'lr': 8.285714285714286e-05, 'samples': 5616, 'steps': 116, 'loss/train': 6.023952960968018}
07/25/2024 06:25:59 - INFO - __main__ - Step 118: {'lr': 8.357142857142858e-05, 'samples': 5664, 'steps': 117, 'loss/train': 7.063660144805908}
07/25/2024 06:26:00 - INFO - __main__ - Step 119: {'lr': 8.428571428571429e-05, 'samples': 5712, 'steps': 118, 'loss/train': 6.6882853507995605}
07/25/2024 06:26:00 - INFO - __main__ - Step 120: {'lr': 8.5e-05, 'samples': 5760, 'steps': 119, 'loss/train': 5.413237571716309}
07/25/2024 06:26:00 - INFO - __main__ - Step 121: {'lr': 8.571428571428571e-05, 'samples': 5808, 'steps': 120, 'loss/train': 6.166462421417236}
07/25/2024 06:26:00 - INFO - __main__ - Step 122: {'lr': 8.642857142857143e-05, 'samples': 5856, 'steps': 121, 'loss/train': 6.413567543029785}
07/25/2024 06:26:01 - INFO - __main__ - Step 123: {'lr': 8.714285714285715e-05, 'samples': 5904, 'steps': 122, 'loss/train': 6.3801727294921875}
07/25/2024 06:26:01 - INFO - __main__ - Step 124: {'lr': 8.785714285714286e-05, 'samples': 5952, 'steps': 123, 'loss/train': 7.042605400085449}
07/25/2024 06:26:01 - INFO - __main__ - Step 125: {'lr': 8.857142857142857e-05, 'samples': 6000, 'steps': 124, 'loss/train': 6.735599517822266}
07/25/2024 06:26:01 - INFO - __main__ - Step 126: {'lr': 8.928571428571429e-05, 'samples': 6048, 'steps': 125, 'loss/train': 6.620289325714111}
07/25/2024 06:26:02 - INFO - __main__ - Step 127: {'lr': 8.999999999999999e-05, 'samples': 6096, 'steps': 126, 'loss/train': 6.738864421844482}
07/25/2024 06:26:02 - INFO - __main__ - Step 128: {'lr': 9.071428571428573e-05, 'samples': 6144, 'steps': 127, 'loss/train': 6.406912326812744}
07/25/2024 06:26:02 - INFO - __main__ - Step 129: {'lr': 9.142857142857143e-05, 'samples': 6192, 'steps': 128, 'loss/train': 6.422929286956787}
07/25/2024 06:26:03 - INFO - __main__ - Step 130: {'lr': 9.214285714285714e-05, 'samples': 6240, 'steps': 129, 'loss/train': 6.476966381072998}
07/25/2024 06:26:03 - INFO - __main__ - Step 131: {'lr': 9.285714285714286e-05, 'samples': 6288, 'steps': 130, 'loss/train': 6.289211273193359}
07/25/2024 06:26:03 - INFO - __main__ - Step 132: {'lr': 9.357142857142857e-05, 'samples': 6336, 'steps': 131, 'loss/train': 6.4881696701049805}
07/25/2024 06:26:03 - INFO - __main__ - Step 133: {'lr': 9.42857142857143e-05, 'samples': 6384, 'steps': 132, 'loss/train': 6.840321063995361}
07/25/2024 06:26:04 - INFO - __main__ - Step 134: {'lr': 9.5e-05, 'samples': 6432, 'steps': 133, 'loss/train': 6.22948694229126}
07/25/2024 06:26:04 - INFO - __main__ - Step 135: {'lr': 9.571428571428571e-05, 'samples': 6480, 'steps': 134, 'loss/train': 5.924211025238037}
07/25/2024 06:26:04 - INFO - __main__ - Step 136: {'lr': 9.642857142857143e-05, 'samples': 6528, 'steps': 135, 'loss/train': 8.402527809143066}
07/25/2024 06:26:05 - INFO - __main__ - Step 137: {'lr': 9.714285714285714e-05, 'samples': 6576, 'steps': 136, 'loss/train': 6.357081413269043}
07/25/2024 06:26:05 - INFO - __main__ - Step 138: {'lr': 9.785714285714286e-05, 'samples': 6624, 'steps': 137, 'loss/train': 6.335728168487549}
07/25/2024 06:26:05 - INFO - __main__ - Step 139: {'lr': 9.857142857142858e-05, 'samples': 6672, 'steps': 138, 'loss/train': 6.388386249542236}
07/25/2024 06:26:05 - INFO - __main__ - Step 140: {'lr': 9.928571428571428e-05, 'samples': 6720, 'steps': 139, 'loss/train': 6.144318103790283}
07/25/2024 06:26:06 - INFO - __main__ - Step 141: {'lr': 0.0001, 'samples': 6768, 'steps': 140, 'loss/train': 5.887519359588623}
07/25/2024 06:26:06 - INFO - __main__ - Step 142: {'lr': 0.00010071428571428571, 'samples': 6816, 'steps': 141, 'loss/train': 6.515809059143066}
07/25/2024 06:26:06 - INFO - __main__ - Step 143: {'lr': 0.00010142857142857143, 'samples': 6864, 'steps': 142, 'loss/train': 6.273582458496094}
07/25/2024 06:26:07 - INFO - __main__ - Step 144: {'lr': 0.00010214285714285715, 'samples': 6912, 'steps': 143, 'loss/train': 6.12056303024292}
07/25/2024 06:26:07 - INFO - __main__ - Step 145: {'lr': 0.00010285714285714286, 'samples': 6960, 'steps': 144, 'loss/train': 6.281930446624756}
07/25/2024 06:26:07 - INFO - __main__ - Step 146: {'lr': 0.00010357142857142858, 'samples': 7008, 'steps': 145, 'loss/train': 6.347898483276367}
07/25/2024 06:26:07 - INFO - __main__ - Step 147: {'lr': 0.00010428571428571428, 'samples': 7056, 'steps': 146, 'loss/train': 6.053178787231445}
07/25/2024 06:26:08 - INFO - __main__ - Step 148: {'lr': 0.000105, 'samples': 7104, 'steps': 147, 'loss/train': 6.299071311950684}
07/25/2024 06:26:08 - INFO - __main__ - Step 149: {'lr': 0.00010571428571428572, 'samples': 7152, 'steps': 148, 'loss/train': 6.214033603668213}
07/25/2024 06:26:08 - INFO - __main__ - Step 150: {'lr': 0.00010642857142857143, 'samples': 7200, 'steps': 149, 'loss/train': 6.36629056930542}
07/25/2024 06:26:08 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:26:08 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:26:12 - INFO - __main__ - Step 150: {'loss/eval': 6.422665119171143, 'perplexity': 615.6417236328125}
07/25/2024 06:26:12 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:26:12 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:26:13 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:26:14 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:27:15 - WARNING - huggingface_hub.repository - Several commits (3) will be pushed upstream.
07/25/2024 06:27:15 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:27:38 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
1e419a3..dcc8019 celestial-aardvark-128 -> celestial-aardvark-128
07/25/2024 06:27:38 - INFO - __main__ - Step 151: {'lr': 0.00010714285714285714, 'samples': 7248, 'steps': 150, 'loss/train': 6.574682235717773}
07/25/2024 06:27:38 - INFO - __main__ - Step 152: {'lr': 0.00010785714285714286, 'samples': 7296, 'steps': 151, 'loss/train': 5.2919840812683105}
07/25/2024 06:27:39 - INFO - __main__ - Step 153: {'lr': 0.00010857142857142858, 'samples': 7344, 'steps': 152, 'loss/train': 6.282163143157959}
07/25/2024 06:27:39 - INFO - __main__ - Step 154: {'lr': 0.0001092857142857143, 'samples': 7392, 'steps': 153, 'loss/train': 6.462711334228516}
07/25/2024 06:27:39 - INFO - __main__ - Step 155: {'lr': 0.00011, 'samples': 7440, 'steps': 154, 'loss/train': 5.595396518707275}
07/25/2024 06:27:39 - INFO - __main__ - Step 156: {'lr': 0.00011071428571428571, 'samples': 7488, 'steps': 155, 'loss/train': 6.128833293914795}
07/25/2024 06:27:40 - INFO - __main__ - Step 157: {'lr': 0.00011142857142857143, 'samples': 7536, 'steps': 156, 'loss/train': 6.035909652709961}
07/25/2024 06:27:40 - INFO - __main__ - Step 158: {'lr': 0.00011214285714285715, 'samples': 7584, 'steps': 157, 'loss/train': 6.275477886199951}
07/25/2024 06:27:40 - INFO - __main__ - Step 159: {'lr': 0.00011285714285714286, 'samples': 7632, 'steps': 158, 'loss/train': 6.1195969581604}
07/25/2024 06:27:40 - INFO - __main__ - Step 160: {'lr': 0.00011357142857142858, 'samples': 7680, 'steps': 159, 'loss/train': 8.316116333007812}
07/25/2024 06:27:41 - INFO - __main__ - Step 161: {'lr': 0.00011428571428571428, 'samples': 7728, 'steps': 160, 'loss/train': 6.287449836730957}
07/25/2024 06:27:41 - INFO - __main__ - Step 162: {'lr': 0.000115, 'samples': 7776, 'steps': 161, 'loss/train': 5.879787445068359}
07/25/2024 06:27:41 - INFO - __main__ - Step 163: {'lr': 0.00011571428571428571, 'samples': 7824, 'steps': 162, 'loss/train': 6.221517086029053}
07/25/2024 06:27:42 - INFO - __main__ - Step 164: {'lr': 0.00011642857142857143, 'samples': 7872, 'steps': 163, 'loss/train': 5.967787265777588}
07/25/2024 06:27:42 - INFO - __main__ - Step 165: {'lr': 0.00011714285714285715, 'samples': 7920, 'steps': 164, 'loss/train': 6.09508752822876}
07/25/2024 06:27:42 - INFO - __main__ - Step 166: {'lr': 0.00011785714285714286, 'samples': 7968, 'steps': 165, 'loss/train': 6.462942123413086}
07/25/2024 06:27:42 - INFO - __main__ - Step 167: {'lr': 0.00011857142857142858, 'samples': 8016, 'steps': 166, 'loss/train': 6.146663188934326}
07/25/2024 06:27:43 - INFO - __main__ - Step 168: {'lr': 0.00011928571428571428, 'samples': 8064, 'steps': 167, 'loss/train': 6.4038286209106445}
07/25/2024 06:27:43 - INFO - __main__ - Step 169: {'lr': 0.00012, 'samples': 8112, 'steps': 168, 'loss/train': 6.267633438110352}
07/25/2024 06:27:43 - INFO - __main__ - Step 170: {'lr': 0.00012071428571428572, 'samples': 8160, 'steps': 169, 'loss/train': 6.64249324798584}
07/25/2024 06:27:44 - INFO - __main__ - Step 171: {'lr': 0.00012142857142857143, 'samples': 8208, 'steps': 170, 'loss/train': 6.448271751403809}
07/25/2024 06:27:44 - INFO - __main__ - Step 172: {'lr': 0.00012214285714285715, 'samples': 8256, 'steps': 171, 'loss/train': 6.485412120819092}
07/25/2024 06:27:44 - INFO - __main__ - Step 173: {'lr': 0.00012285714285714287, 'samples': 8304, 'steps': 172, 'loss/train': 6.213407516479492}
07/25/2024 06:27:44 - INFO - __main__ - Step 174: {'lr': 0.00012357142857142856, 'samples': 8352, 'steps': 173, 'loss/train': 5.832103729248047}
07/25/2024 06:27:45 - INFO - __main__ - Step 175: {'lr': 0.00012428571428571428, 'samples': 8400, 'steps': 174, 'loss/train': 5.645206928253174}
07/25/2024 06:27:45 - INFO - __main__ - Step 176: {'lr': 0.000125, 'samples': 8448, 'steps': 175, 'loss/train': 5.942577838897705}
07/25/2024 06:27:45 - INFO - __main__ - Step 177: {'lr': 0.00012571428571428572, 'samples': 8496, 'steps': 176, 'loss/train': 6.108009338378906}
07/25/2024 06:27:46 - INFO - __main__ - Step 178: {'lr': 0.00012642857142857142, 'samples': 8544, 'steps': 177, 'loss/train': 6.048696994781494}
07/25/2024 06:27:46 - INFO - __main__ - Step 179: {'lr': 0.00012714285714285714, 'samples': 8592, 'steps': 178, 'loss/train': 6.014152526855469}
07/25/2024 06:27:46 - INFO - __main__ - Step 180: {'lr': 0.00012785714285714286, 'samples': 8640, 'steps': 179, 'loss/train': 6.590332508087158}
07/25/2024 06:27:46 - INFO - __main__ - Step 181: {'lr': 0.00012857142857142855, 'samples': 8688, 'steps': 180, 'loss/train': 6.095800399780273}
07/25/2024 06:27:47 - INFO - __main__ - Step 182: {'lr': 0.0001292857142857143, 'samples': 8736, 'steps': 181, 'loss/train': 5.968374729156494}
07/25/2024 06:27:47 - INFO - __main__ - Step 183: {'lr': 0.00013000000000000002, 'samples': 8784, 'steps': 182, 'loss/train': 6.073035717010498}
07/25/2024 06:27:47 - INFO - __main__ - Step 184: {'lr': 0.00013071428571428574, 'samples': 8832, 'steps': 183, 'loss/train': 7.681509494781494}
07/25/2024 06:27:47 - INFO - __main__ - Step 185: {'lr': 0.00013142857142857143, 'samples': 8880, 'steps': 184, 'loss/train': 5.806171417236328}
07/25/2024 06:27:48 - INFO - __main__ - Step 186: {'lr': 0.00013214285714285715, 'samples': 8928, 'steps': 185, 'loss/train': 5.868297576904297}
07/25/2024 06:27:48 - INFO - __main__ - Step 187: {'lr': 0.00013285714285714287, 'samples': 8976, 'steps': 186, 'loss/train': 5.532838344573975}
07/25/2024 06:27:48 - INFO - __main__ - Step 188: {'lr': 0.00013357142857142856, 'samples': 9024, 'steps': 187, 'loss/train': 6.210916042327881}
07/25/2024 06:27:49 - INFO - __main__ - Step 189: {'lr': 0.00013428571428571428, 'samples': 9072, 'steps': 188, 'loss/train': 5.803860187530518}
07/25/2024 06:27:49 - INFO - __main__ - Step 190: {'lr': 0.000135, 'samples': 9120, 'steps': 189, 'loss/train': 6.666335105895996}
07/25/2024 06:27:49 - INFO - __main__ - Step 191: {'lr': 0.0001357142857142857, 'samples': 9168, 'steps': 190, 'loss/train': 5.624790668487549}
07/25/2024 06:27:49 - INFO - __main__ - Step 192: {'lr': 0.00013642857142857144, 'samples': 9216, 'steps': 191, 'loss/train': 5.217100143432617}
07/25/2024 06:27:50 - INFO - __main__ - Step 193: {'lr': 0.00013714285714285716, 'samples': 9264, 'steps': 192, 'loss/train': 5.951303482055664}
07/25/2024 06:27:50 - INFO - __main__ - Step 194: {'lr': 0.00013785714285714285, 'samples': 9312, 'steps': 193, 'loss/train': 5.851853847503662}
07/25/2024 06:27:50 - INFO - __main__ - Step 195: {'lr': 0.00013857142857142857, 'samples': 9360, 'steps': 194, 'loss/train': 5.776468276977539}
07/25/2024 06:27:51 - INFO - __main__ - Step 196: {'lr': 0.0001392857142857143, 'samples': 9408, 'steps': 195, 'loss/train': 5.7882866859436035}
07/25/2024 06:27:51 - INFO - __main__ - Step 197: {'lr': 0.00014000000000000001, 'samples': 9456, 'steps': 196, 'loss/train': 5.621963024139404}
07/25/2024 06:27:51 - INFO - __main__ - Step 198: {'lr': 0.0001407142857142857, 'samples': 9504, 'steps': 197, 'loss/train': 5.277397632598877}
07/25/2024 06:27:51 - INFO - __main__ - Step 199: {'lr': 0.00014142857142857143, 'samples': 9552, 'steps': 198, 'loss/train': 5.9324951171875}
07/25/2024 06:27:52 - INFO - __main__ - Step 200: {'lr': 0.00014214285714285715, 'samples': 9600, 'steps': 199, 'loss/train': 6.0901618003845215}
07/25/2024 06:27:52 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:27:52 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:27:55 - INFO - __main__ - Step 200: {'loss/eval': 6.142789840698242, 'perplexity': 465.3500061035156}
07/25/2024 06:27:56 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:27:56 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:27:56 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:27:58 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:28:59 - WARNING - huggingface_hub.repository - Several commits (4) will be pushed upstream.
07/25/2024 06:28:59 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:29:25 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
dcc8019..d02b805 celestial-aardvark-128 -> celestial-aardvark-128
07/25/2024 06:29:25 - INFO - __main__ - Step 201: {'lr': 0.00014285714285714284, 'samples': 9648, 'steps': 200, 'loss/train': 5.745926856994629}
07/25/2024 06:29:25 - INFO - __main__ - Step 202: {'lr': 0.0001435714285714286, 'samples': 9696, 'steps': 201, 'loss/train': 6.288934707641602}
07/25/2024 06:29:26 - INFO - __main__ - Step 203: {'lr': 0.0001442857142857143, 'samples': 9744, 'steps': 202, 'loss/train': 6.304495811462402}
07/25/2024 06:29:26 - INFO - __main__ - Step 204: {'lr': 0.000145, 'samples': 9792, 'steps': 203, 'loss/train': 6.896693706512451}
07/25/2024 06:29:26 - INFO - __main__ - Step 205: {'lr': 0.00014571428571428572, 'samples': 9840, 'steps': 204, 'loss/train': 5.75565767288208}
07/25/2024 06:29:26 - INFO - __main__ - Step 206: {'lr': 0.00014642857142857144, 'samples': 9888, 'steps': 205, 'loss/train': 6.053487300872803}
07/25/2024 06:29:27 - INFO - __main__ - Step 207: {'lr': 0.00014714285714285713, 'samples': 9936, 'steps': 206, 'loss/train': 5.872729301452637}
07/25/2024 06:29:27 - INFO - __main__ - Step 208: {'lr': 0.00014785714285714285, 'samples': 9984, 'steps': 207, 'loss/train': 7.389420509338379}
07/25/2024 06:29:27 - INFO - __main__ - Step 209: {'lr': 0.00014857142857142857, 'samples': 10032, 'steps': 208, 'loss/train': 6.749051570892334}
07/25/2024 06:29:27 - INFO - __main__ - Step 210: {'lr': 0.0001492857142857143, 'samples': 10080, 'steps': 209, 'loss/train': 5.964937210083008}
07/25/2024 06:29:28 - INFO - __main__ - Step 211: {'lr': 0.00015, 'samples': 10128, 'steps': 210, 'loss/train': 6.29296350479126}
07/25/2024 06:29:28 - INFO - __main__ - Step 212: {'lr': 0.0001507142857142857, 'samples': 10176, 'steps': 211, 'loss/train': 6.124290466308594}
07/25/2024 06:29:28 - INFO - __main__ - Step 213: {'lr': 0.00015142857142857145, 'samples': 10224, 'steps': 212, 'loss/train': 6.875829219818115}
07/25/2024 06:29:29 - INFO - __main__ - Step 214: {'lr': 0.00015214285714285715, 'samples': 10272, 'steps': 213, 'loss/train': 6.973008155822754}
07/25/2024 06:29:29 - INFO - __main__ - Step 215: {'lr': 0.00015285714285714287, 'samples': 10320, 'steps': 214, 'loss/train': 6.136086940765381}
07/25/2024 06:29:29 - INFO - __main__ - Step 216: {'lr': 0.0001535714285714286, 'samples': 10368, 'steps': 215, 'loss/train': 5.827876567840576}
07/25/2024 06:29:29 - INFO - __main__ - Step 217: {'lr': 0.00015428571428571428, 'samples': 10416, 'steps': 216, 'loss/train': 6.297738552093506}
07/25/2024 06:29:30 - INFO - __main__ - Step 218: {'lr': 0.000155, 'samples': 10464, 'steps': 217, 'loss/train': 5.124302387237549}
07/25/2024 06:29:30 - INFO - __main__ - Step 219: {'lr': 0.00015571428571428572, 'samples': 10512, 'steps': 218, 'loss/train': 5.82398796081543}
07/25/2024 06:29:30 - INFO - __main__ - Step 220: {'lr': 0.0001564285714285714, 'samples': 10560, 'steps': 219, 'loss/train': 5.920914649963379}
07/25/2024 06:29:31 - INFO - __main__ - Step 221: {'lr': 0.00015714285714285713, 'samples': 10608, 'steps': 220, 'loss/train': 5.506519317626953}
07/25/2024 06:29:31 - INFO - __main__ - Step 222: {'lr': 0.00015785714285714285, 'samples': 10656, 'steps': 221, 'loss/train': 5.194490432739258}
07/25/2024 06:29:31 - INFO - __main__ - Step 223: {'lr': 0.00015857142857142857, 'samples': 10704, 'steps': 222, 'loss/train': 6.241917610168457}
07/25/2024 06:29:31 - INFO - __main__ - Step 224: {'lr': 0.0001592857142857143, 'samples': 10752, 'steps': 223, 'loss/train': 5.662716388702393}
07/25/2024 06:29:32 - INFO - __main__ - Step 225: {'lr': 0.00016, 'samples': 10800, 'steps': 224, 'loss/train': 5.275988578796387}
07/25/2024 06:29:32 - INFO - __main__ - Step 226: {'lr': 0.00016071428571428573, 'samples': 10848, 'steps': 225, 'loss/train': 5.916398048400879}
07/25/2024 06:29:32 - INFO - __main__ - Step 227: {'lr': 0.00016142857142857143, 'samples': 10896, 'steps': 226, 'loss/train': 5.93534517288208}
07/25/2024 06:29:33 - INFO - __main__ - Step 228: {'lr': 0.00016214285714285715, 'samples': 10944, 'steps': 227, 'loss/train': 6.050380229949951}
07/25/2024 06:29:33 - INFO - __main__ - Step 229: {'lr': 0.00016285714285714287, 'samples': 10992, 'steps': 228, 'loss/train': 6.600334644317627}
07/25/2024 06:29:33 - INFO - __main__ - Step 230: {'lr': 0.00016357142857142856, 'samples': 11040, 'steps': 229, 'loss/train': 6.150309085845947}
07/25/2024 06:29:33 - INFO - __main__ - Step 231: {'lr': 0.00016428571428571428, 'samples': 11088, 'steps': 230, 'loss/train': 6.019353866577148}
07/25/2024 06:29:34 - INFO - __main__ - Step 232: {'lr': 0.000165, 'samples': 11136, 'steps': 231, 'loss/train': 7.122209548950195}
07/25/2024 06:29:34 - INFO - __main__ - Step 233: {'lr': 0.00016571428571428572, 'samples': 11184, 'steps': 232, 'loss/train': 5.891404151916504}
07/25/2024 06:29:34 - INFO - __main__ - Step 234: {'lr': 0.00016642857142857144, 'samples': 11232, 'steps': 233, 'loss/train': 5.697052955627441}
07/25/2024 06:29:34 - INFO - __main__ - Step 235: {'lr': 0.00016714285714285716, 'samples': 11280, 'steps': 234, 'loss/train': 5.768013954162598}
07/25/2024 06:29:35 - INFO - __main__ - Step 236: {'lr': 0.00016785714285714285, 'samples': 11328, 'steps': 235, 'loss/train': 5.943960666656494}
07/25/2024 06:29:35 - INFO - __main__ - Step 237: {'lr': 0.00016857142857142857, 'samples': 11376, 'steps': 236, 'loss/train': 7.096799850463867}
07/25/2024 06:29:35 - INFO - __main__ - Step 238: {'lr': 0.0001692857142857143, 'samples': 11424, 'steps': 237, 'loss/train': 7.258213996887207}
07/25/2024 06:29:36 - INFO - __main__ - Step 239: {'lr': 0.00017, 'samples': 11472, 'steps': 238, 'loss/train': 5.474708080291748}
07/25/2024 06:29:36 - INFO - __main__ - Step 240: {'lr': 0.0001707142857142857, 'samples': 11520, 'steps': 239, 'loss/train': 5.929581642150879}
07/25/2024 06:29:36 - INFO - __main__ - Step 241: {'lr': 0.00017142857142857143, 'samples': 11568, 'steps': 240, 'loss/train': 5.396873950958252}
07/25/2024 06:29:36 - INFO - __main__ - Step 242: {'lr': 0.00017214285714285715, 'samples': 11616, 'steps': 241, 'loss/train': 5.90254020690918}
07/25/2024 06:29:37 - INFO - __main__ - Step 243: {'lr': 0.00017285714285714287, 'samples': 11664, 'steps': 242, 'loss/train': 5.579410076141357}
07/25/2024 06:29:37 - INFO - __main__ - Step 244: {'lr': 0.00017357142857142859, 'samples': 11712, 'steps': 243, 'loss/train': 6.5500946044921875}
07/25/2024 06:29:37 - INFO - __main__ - Step 245: {'lr': 0.0001742857142857143, 'samples': 11760, 'steps': 244, 'loss/train': 6.13820219039917}
07/25/2024 06:29:38 - INFO - __main__ - Step 246: {'lr': 0.000175, 'samples': 11808, 'steps': 245, 'loss/train': 5.283195972442627}
07/25/2024 06:29:38 - INFO - __main__ - Step 247: {'lr': 0.00017571428571428572, 'samples': 11856, 'steps': 246, 'loss/train': 5.3597211837768555}
07/25/2024 06:29:38 - INFO - __main__ - Step 248: {'lr': 0.00017642857142857144, 'samples': 11904, 'steps': 247, 'loss/train': 5.715787410736084}
07/25/2024 06:29:38 - INFO - __main__ - Step 249: {'lr': 0.00017714285714285713, 'samples': 11952, 'steps': 248, 'loss/train': 5.988589286804199}
07/25/2024 06:29:39 - INFO - __main__ - Step 250: {'lr': 0.00017785714285714285, 'samples': 12000, 'steps': 249, 'loss/train': 6.131600856781006}
07/25/2024 06:29:39 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:29:39 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:29:42 - INFO - __main__ - Step 250: {'loss/eval': 5.960291385650635, 'perplexity': 387.72308349609375}
07/25/2024 06:29:43 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:29:43 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:29:43 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:29:44 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:30:45 - WARNING - huggingface_hub.repository - Several commits (5) will be pushed upstream.
07/25/2024 06:30:45 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:31:13 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
d02b805..4d31a9f celestial-aardvark-128 -> celestial-aardvark-128
07/25/2024 06:31:13 - INFO - __main__ - Step 251: {'lr': 0.00017857142857142857, 'samples': 12048, 'steps': 250, 'loss/train': 5.627201557159424}
07/25/2024 06:31:14 - INFO - __main__ - Step 252: {'lr': 0.0001792857142857143, 'samples': 12096, 'steps': 251, 'loss/train': 6.002392292022705}
07/25/2024 06:31:14 - INFO - __main__ - Step 253: {'lr': 0.00017999999999999998, 'samples': 12144, 'steps': 252, 'loss/train': 5.872100353240967}
07/25/2024 06:31:14 - INFO - __main__ - Step 254: {'lr': 0.00018071428571428573, 'samples': 12192, 'steps': 253, 'loss/train': 6.0609612464904785}
07/25/2024 06:31:14 - INFO - __main__ - Step 255: {'lr': 0.00018142857142857145, 'samples': 12240, 'steps': 254, 'loss/train': 6.275620460510254}
07/25/2024 06:31:15 - INFO - __main__ - Step 256: {'lr': 0.00018214285714285714, 'samples': 12288, 'steps': 255, 'loss/train': 6.78406286239624}
07/25/2024 06:31:15 - INFO - __main__ - Step 257: {'lr': 0.00018285714285714286, 'samples': 12336, 'steps': 256, 'loss/train': 6.069532871246338}
07/25/2024 06:31:15 - INFO - __main__ - Step 258: {'lr': 0.00018357142857142858, 'samples': 12384, 'steps': 257, 'loss/train': 5.567933559417725}
07/25/2024 06:31:16 - INFO - __main__ - Step 259: {'lr': 0.00018428571428571428, 'samples': 12432, 'steps': 258, 'loss/train': 6.152994632720947}
07/25/2024 06:31:16 - INFO - __main__ - Step 260: {'lr': 0.000185, 'samples': 12480, 'steps': 259, 'loss/train': 5.771788120269775}
07/25/2024 06:31:16 - INFO - __main__ - Step 261: {'lr': 0.00018571428571428572, 'samples': 12528, 'steps': 260, 'loss/train': 5.717995643615723}
07/25/2024 06:31:16 - INFO - __main__ - Step 262: {'lr': 0.0001864285714285714, 'samples': 12576, 'steps': 261, 'loss/train': 5.839302062988281}
07/25/2024 06:31:17 - INFO - __main__ - Step 263: {'lr': 0.00018714285714285713, 'samples': 12624, 'steps': 262, 'loss/train': 5.257016658782959}
07/25/2024 06:31:17 - INFO - __main__ - Step 264: {'lr': 0.00018785714285714288, 'samples': 12672, 'steps': 263, 'loss/train': 6.241714000701904}
07/25/2024 06:31:17 - INFO - __main__ - Step 265: {'lr': 0.0001885714285714286, 'samples': 12720, 'steps': 264, 'loss/train': 6.639944553375244}
07/25/2024 06:31:17 - INFO - __main__ - Step 266: {'lr': 0.0001892857142857143, 'samples': 12768, 'steps': 265, 'loss/train': 5.12101936340332}
07/25/2024 06:31:18 - INFO - __main__ - Step 267: {'lr': 0.00019, 'samples': 12816, 'steps': 266, 'loss/train': 5.190861701965332}
07/25/2024 06:31:18 - INFO - __main__ - Step 268: {'lr': 0.00019071428571428573, 'samples': 12864, 'steps': 267, 'loss/train': 6.486904621124268}
07/25/2024 06:31:18 - INFO - __main__ - Step 269: {'lr': 0.00019142857142857142, 'samples': 12912, 'steps': 268, 'loss/train': 5.638678073883057}
07/25/2024 06:31:19 - INFO - __main__ - Step 270: {'lr': 0.00019214285714285714, 'samples': 12960, 'steps': 269, 'loss/train': 5.088951110839844}
07/25/2024 06:31:19 - INFO - __main__ - Step 271: {'lr': 0.00019285714285714286, 'samples': 13008, 'steps': 270, 'loss/train': 5.137499809265137}
07/25/2024 06:31:19 - INFO - __main__ - Step 272: {'lr': 0.00019357142857142856, 'samples': 13056, 'steps': 271, 'loss/train': 4.604417324066162}
07/25/2024 06:31:19 - INFO - __main__ - Step 273: {'lr': 0.00019428571428571428, 'samples': 13104, 'steps': 272, 'loss/train': 5.781164646148682}
07/25/2024 06:31:20 - INFO - __main__ - Step 274: {'lr': 0.00019500000000000002, 'samples': 13152, 'steps': 273, 'loss/train': 6.4048309326171875}
07/25/2024 06:31:20 - INFO - __main__ - Step 275: {'lr': 0.00019571428571428572, 'samples': 13200, 'steps': 274, 'loss/train': 6.040492057800293}
07/25/2024 06:31:20 - INFO - __main__ - Step 276: {'lr': 0.00019642857142857144, 'samples': 13248, 'steps': 275, 'loss/train': 5.667052745819092}
07/25/2024 06:31:21 - INFO - __main__ - Step 277: {'lr': 0.00019714285714285716, 'samples': 13296, 'steps': 276, 'loss/train': 5.5247483253479}
07/25/2024 06:31:21 - INFO - __main__ - Step 278: {'lr': 0.00019785714285714288, 'samples': 13344, 'steps': 277, 'loss/train': 5.584035396575928}
07/25/2024 06:31:21 - INFO - __main__ - Step 279: {'lr': 0.00019857142857142857, 'samples': 13392, 'steps': 278, 'loss/train': 5.613864898681641}
07/25/2024 06:31:21 - INFO - __main__ - Step 280: {'lr': 0.0001992857142857143, 'samples': 13440, 'steps': 279, 'loss/train': 5.550878524780273}
07/25/2024 06:31:22 - INFO - __main__ - Step 281: {'lr': 0.0002, 'samples': 13488, 'steps': 280, 'loss/train': 6.560573101043701}
07/25/2024 06:31:22 - INFO - __main__ - Step 282: {'lr': 0.0002007142857142857, 'samples': 13536, 'steps': 281, 'loss/train': 5.38557767868042}
07/25/2024 06:31:22 - INFO - __main__ - Step 283: {'lr': 0.00020142857142857142, 'samples': 13584, 'steps': 282, 'loss/train': 6.759729862213135}
07/25/2024 06:31:23 - INFO - __main__ - Step 284: {'lr': 0.00020214285714285714, 'samples': 13632, 'steps': 283, 'loss/train': 6.179801940917969}
07/25/2024 06:31:23 - INFO - __main__ - Step 285: {'lr': 0.00020285714285714286, 'samples': 13680, 'steps': 284, 'loss/train': 5.904941082000732}
07/25/2024 06:31:23 - INFO - __main__ - Step 286: {'lr': 0.00020357142857142858, 'samples': 13728, 'steps': 285, 'loss/train': 5.76945161819458}
07/25/2024 06:31:23 - INFO - __main__ - Step 287: {'lr': 0.0002042857142857143, 'samples': 13776, 'steps': 286, 'loss/train': 8.2332124710083}
07/25/2024 06:31:24 - INFO - __main__ - Step 288: {'lr': 0.000205, 'samples': 13824, 'steps': 287, 'loss/train': 5.863339900970459}
07/25/2024 06:31:24 - INFO - __main__ - Step 289: {'lr': 0.00020571428571428572, 'samples': 13872, 'steps': 288, 'loss/train': 6.213030815124512}
07/25/2024 06:31:24 - INFO - __main__ - Step 290: {'lr': 0.00020642857142857144, 'samples': 13920, 'steps': 289, 'loss/train': 4.734172821044922}
07/25/2024 06:31:25 - INFO - __main__ - Step 291: {'lr': 0.00020714285714285716, 'samples': 13968, 'steps': 290, 'loss/train': 5.674801349639893}
07/25/2024 06:31:25 - INFO - __main__ - Step 292: {'lr': 0.00020785714285714285, 'samples': 14016, 'steps': 291, 'loss/train': 5.784888744354248}
07/25/2024 06:31:25 - INFO - __main__ - Step 293: {'lr': 0.00020857142857142857, 'samples': 14064, 'steps': 292, 'loss/train': 5.5319390296936035}
07/25/2024 06:31:25 - INFO - __main__ - Step 294: {'lr': 0.0002092857142857143, 'samples': 14112, 'steps': 293, 'loss/train': 5.685769557952881}
07/25/2024 06:31:26 - INFO - __main__ - Step 295: {'lr': 0.00021, 'samples': 14160, 'steps': 294, 'loss/train': 5.418774604797363}
07/25/2024 06:31:26 - INFO - __main__ - Step 296: {'lr': 0.00021071428571428573, 'samples': 14208, 'steps': 295, 'loss/train': 4.068847179412842}
07/25/2024 06:31:26 - INFO - __main__ - Step 297: {'lr': 0.00021142857142857145, 'samples': 14256, 'steps': 296, 'loss/train': 5.367792129516602}
07/25/2024 06:31:26 - INFO - __main__ - Step 298: {'lr': 0.00021214285714285714, 'samples': 14304, 'steps': 297, 'loss/train': 5.713776588439941}
07/25/2024 06:31:27 - INFO - __main__ - Step 299: {'lr': 0.00021285714285714286, 'samples': 14352, 'steps': 298, 'loss/train': 5.603511810302734}
07/25/2024 06:31:27 - INFO - __main__ - Step 300: {'lr': 0.00021357142857142858, 'samples': 14400, 'steps': 299, 'loss/train': 6.163950443267822}
07/25/2024 06:31:27 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:31:27 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:31:31 - INFO - __main__ - Step 300: {'loss/eval': 5.79922342300415, 'perplexity': 330.0431823730469}
07/25/2024 06:31:31 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:31:31 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:31:32 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:31:33 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
07/25/2024 06:32:35 - WARNING - huggingface_hub.repository - Several commits (6) will be pushed upstream.
07/25/2024 06:32:35 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
07/25/2024 06:33:00 - WARNING - huggingface_hub.repository - To https://huggingface.co/shng2025/gptesla-small
4d31a9f..aae7e8d celestial-aardvark-128 -> celestial-aardvark-128
07/25/2024 06:33:00 - INFO - __main__ - Step 301: {'lr': 0.00021428571428571427, 'samples': 14448, 'steps': 300, 'loss/train': 5.406757354736328}
07/25/2024 06:33:00 - INFO - __main__ - Step 302: {'lr': 0.000215, 'samples': 14496, 'steps': 301, 'loss/train': 5.90996789932251}
07/25/2024 06:33:01 - INFO - __main__ - Step 303: {'lr': 0.00021571428571428571, 'samples': 14544, 'steps': 302, 'loss/train': 6.092479228973389}
07/25/2024 06:33:01 - INFO - __main__ - Step 304: {'lr': 0.00021642857142857143, 'samples': 14592, 'steps': 303, 'loss/train': 5.216100215911865}
07/25/2024 06:33:01 - INFO - __main__ - Step 305: {'lr': 0.00021714285714285715, 'samples': 14640, 'steps': 304, 'loss/train': 5.621682643890381}
07/25/2024 06:33:01 - INFO - __main__ - Step 306: {'lr': 0.00021785714285714287, 'samples': 14688, 'steps': 305, 'loss/train': 5.823093414306641}
07/25/2024 06:33:02 - INFO - __main__ - Step 307: {'lr': 0.0002185714285714286, 'samples': 14736, 'steps': 306, 'loss/train': 6.228525161743164}
07/25/2024 06:33:02 - INFO - __main__ - Step 308: {'lr': 0.0002192857142857143, 'samples': 14784, 'steps': 307, 'loss/train': 5.9510087966918945}
07/25/2024 06:33:02 - INFO - __main__ - Step 309: {'lr': 0.00022, 'samples': 14832, 'steps': 308, 'loss/train': 5.266091346740723}
07/25/2024 06:33:03 - INFO - __main__ - Step 310: {'lr': 0.00022071428571428573, 'samples': 14880, 'steps': 309, 'loss/train': 5.217267036437988}
07/25/2024 06:33:03 - INFO - __main__ - Step 311: {'lr': 0.00022142857142857142, 'samples': 14928, 'steps': 310, 'loss/train': 7.697060585021973}
07/25/2024 06:33:03 - INFO - __main__ - Step 312: {'lr': 0.00022214285714285714, 'samples': 14976, 'steps': 311, 'loss/train': 5.666650772094727}
07/25/2024 06:33:03 - INFO - __main__ - Step 313: {'lr': 0.00022285714285714286, 'samples': 15024, 'steps': 312, 'loss/train': 6.425085067749023}
07/25/2024 06:33:04 - INFO - __main__ - Step 314: {'lr': 0.00022357142857142855, 'samples': 15072, 'steps': 313, 'loss/train': 4.396389007568359}
07/25/2024 06:33:04 - INFO - __main__ - Step 315: {'lr': 0.0002242857142857143, 'samples': 15120, 'steps': 314, 'loss/train': 5.2941131591796875}
07/25/2024 06:33:04 - INFO - __main__ - Step 316: {'lr': 0.00022500000000000002, 'samples': 15168, 'steps': 315, 'loss/train': 5.752312183380127}
07/25/2024 06:33:04 - INFO - __main__ - Step 317: {'lr': 0.00022571428571428571, 'samples': 15216, 'steps': 316, 'loss/train': 6.089960098266602}
07/25/2024 06:33:05 - INFO - __main__ - Step 318: {'lr': 0.00022642857142857143, 'samples': 15264, 'steps': 317, 'loss/train': 5.828670978546143}
07/25/2024 06:33:05 - INFO - __main__ - Step 319: {'lr': 0.00022714285714285715, 'samples': 15312, 'steps': 318, 'loss/train': 5.34361457824707}
07/25/2024 06:33:05 - INFO - __main__ - Step 320: {'lr': 0.00022785714285714287, 'samples': 15360, 'steps': 319, 'loss/train': 3.9433271884918213}
07/25/2024 06:33:06 - INFO - __main__ - Step 321: {'lr': 0.00022857142857142857, 'samples': 15408, 'steps': 320, 'loss/train': 5.489405632019043}
07/25/2024 06:33:06 - INFO - __main__ - Step 322: {'lr': 0.0002292857142857143, 'samples': 15456, 'steps': 321, 'loss/train': 5.065426826477051}
07/25/2024 06:33:06 - INFO - __main__ - Step 323: {'lr': 0.00023, 'samples': 15504, 'steps': 322, 'loss/train': 4.657402038574219}
07/25/2024 06:33:06 - INFO - __main__ - Step 324: {'lr': 0.0002307142857142857, 'samples': 15552, 'steps': 323, 'loss/train': 6.042489528656006}
07/25/2024 06:33:07 - INFO - __main__ - Step 325: {'lr': 0.00023142857142857142, 'samples': 15600, 'steps': 324, 'loss/train': 5.562082290649414}
07/25/2024 06:33:07 - INFO - __main__ - Step 326: {'lr': 0.00023214285714285717, 'samples': 15648, 'steps': 325, 'loss/train': 5.726541519165039}
07/25/2024 06:33:07 - INFO - __main__ - Step 327: {'lr': 0.00023285714285714286, 'samples': 15696, 'steps': 326, 'loss/train': 5.573945045471191}
07/25/2024 06:33:08 - INFO - __main__ - Step 328: {'lr': 0.00023357142857142858, 'samples': 15744, 'steps': 327, 'loss/train': 6.105917930603027}
07/25/2024 06:33:08 - INFO - __main__ - Step 329: {'lr': 0.0002342857142857143, 'samples': 15792, 'steps': 328, 'loss/train': 5.546865463256836}
07/25/2024 06:33:08 - INFO - __main__ - Step 330: {'lr': 0.000235, 'samples': 15840, 'steps': 329, 'loss/train': 5.543821334838867}
07/25/2024 06:33:08 - INFO - __main__ - Step 331: {'lr': 0.0002357142857142857, 'samples': 15888, 'steps': 330, 'loss/train': 5.6774582862854}
07/25/2024 06:33:09 - INFO - __main__ - Step 332: {'lr': 0.00023642857142857143, 'samples': 15936, 'steps': 331, 'loss/train': 5.767722129821777}
07/25/2024 06:33:09 - INFO - __main__ - Step 333: {'lr': 0.00023714285714285715, 'samples': 15984, 'steps': 332, 'loss/train': 5.70899772644043}
07/25/2024 06:33:09 - INFO - __main__ - Step 334: {'lr': 0.00023785714285714285, 'samples': 16032, 'steps': 333, 'loss/train': 5.67036247253418}
07/25/2024 06:33:10 - INFO - __main__ - Step 335: {'lr': 0.00023857142857142857, 'samples': 16080, 'steps': 334, 'loss/train': 5.325812339782715}
07/25/2024 06:33:10 - INFO - __main__ - Step 336: {'lr': 0.0002392857142857143, 'samples': 16128, 'steps': 335, 'loss/train': 5.349172592163086}
07/25/2024 06:33:10 - INFO - __main__ - Step 337: {'lr': 0.00024, 'samples': 16176, 'steps': 336, 'loss/train': 5.448930263519287}
07/25/2024 06:33:10 - INFO - __main__ - Step 338: {'lr': 0.00024071428571428573, 'samples': 16224, 'steps': 337, 'loss/train': 3.7934205532073975}
07/25/2024 06:33:11 - INFO - __main__ - Step 339: {'lr': 0.00024142857142857145, 'samples': 16272, 'steps': 338, 'loss/train': 5.1056013107299805}
07/25/2024 06:33:11 - INFO - __main__ - Step 340: {'lr': 0.00024214285714285714, 'samples': 16320, 'steps': 339, 'loss/train': 5.9682464599609375}
07/25/2024 06:33:11 - INFO - __main__ - Step 341: {'lr': 0.00024285714285714286, 'samples': 16368, 'steps': 340, 'loss/train': 5.546884536743164}
07/25/2024 06:33:12 - INFO - __main__ - Step 342: {'lr': 0.00024357142857142858, 'samples': 16416, 'steps': 341, 'loss/train': 6.586970329284668}
07/25/2024 06:33:12 - INFO - __main__ - Step 343: {'lr': 0.0002442857142857143, 'samples': 16464, 'steps': 342, 'loss/train': 5.654937744140625}
07/25/2024 06:33:12 - INFO - __main__ - Step 344: {'lr': 0.000245, 'samples': 16512, 'steps': 343, 'loss/train': 3.9033658504486084}
07/25/2024 06:33:12 - INFO - __main__ - Step 345: {'lr': 0.00024571428571428574, 'samples': 16560, 'steps': 344, 'loss/train': 6.266292095184326}
07/25/2024 06:33:13 - INFO - __main__ - Step 346: {'lr': 0.00024642857142857143, 'samples': 16608, 'steps': 345, 'loss/train': 5.5901007652282715}
07/25/2024 06:33:13 - INFO - __main__ - Step 347: {'lr': 0.0002471428571428571, 'samples': 16656, 'steps': 346, 'loss/train': 5.836148738861084}
07/25/2024 06:33:13 - INFO - __main__ - Step 348: {'lr': 0.00024785714285714287, 'samples': 16704, 'steps': 347, 'loss/train': 5.447431564331055}
07/25/2024 06:33:13 - INFO - __main__ - Step 349: {'lr': 0.00024857142857142857, 'samples': 16752, 'steps': 348, 'loss/train': 5.124023914337158}
07/25/2024 06:33:14 - INFO - __main__ - Step 350: {'lr': 0.00024928571428571426, 'samples': 16800, 'steps': 349, 'loss/train': 5.541380405426025}
07/25/2024 06:33:14 - INFO - __main__ - Evaluating and saving model checkpoint
07/25/2024 06:33:14 - DEBUG - datasets.iterable_dataset - dataloader worker#0, ': Starting to iterate over 1/1 shards.
07/25/2024 06:33:17 - INFO - __main__ - Step 350: {'loss/eval': 5.6890645027160645, 'perplexity': 295.616943359375}
07/25/2024 06:33:18 - INFO - accelerate.accelerator - Saving current state to my_checkpoint
07/25/2024 06:33:18 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
07/25/2024 06:33:18 - INFO - accelerate.checkpointing - Model weights saved in my_checkpoint/model.safetensors
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Optimizer state saved in my_checkpoint/optimizer.bin
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in my_checkpoint/sampler.bin
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in my_checkpoint/sampler_1.bin
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Gradient scaler state saved in my_checkpoint/scaler.pt
07/25/2024 06:33:20 - INFO - accelerate.checkpointing - Random states saved in my_checkpoint/random_states_0.pkl
|