[2024-10-20 18:25:17,510][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [2024-10-20 18:25:17,521][Main][INFO] - Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-10-20 18:25:17,522][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-10-20/18-25-17 [2024-10-20 18:31:35,111][Main][INFO] - [train] Step 25 out of 65536 | Loss --> 155.837 | Loss_ntp --> 76.275 | Loss_mlm --> 79.561 | Grad_l2 --> 476.354 | Weights_l2 --> 7701.821 | Lr --> 0.001 | Seconds_per_step --> 14.044 | [2024-10-20 18:35:35,171][Main][INFO] - [train] Step 50 out of 65536 | Loss --> 98.644 | Loss_ntp --> 48.540 | Loss_mlm --> 50.105 | Grad_l2 --> 234.932 | Weights_l2 --> 7701.813 | Lr --> 0.001 | Seconds_per_step --> 9.602 | [2024-10-20 18:39:35,197][Main][INFO] - [train] Step 75 out of 65536 | Loss --> 86.994 | Loss_ntp --> 42.861 | Loss_mlm --> 44.133 | Grad_l2 --> 180.388 | Weights_l2 --> 7701.806 | Lr --> 0.001 | Seconds_per_step --> 9.601 | [2024-10-20 18:43:35,733][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 80.568 | Loss_ntp --> 39.806 | Loss_mlm --> 40.762 | Grad_l2 --> 156.732 | Weights_l2 --> 7701.800 | Lr --> 0.001 | Seconds_per_step --> 9.621 | [2024-10-20 18:47:37,016][Main][INFO] - [train] Step 125 out of 65536 | Loss --> 77.131 | Loss_ntp --> 38.127 | Loss_mlm --> 39.004 | Grad_l2 --> 179.590 | Weights_l2 --> 7701.794 | Lr --> 0.001 | Seconds_per_step --> 9.651 | [2024-10-20 18:51:38,437][Main][INFO] - [train] Step 150 out of 65536 | Loss --> 73.900 | Loss_ntp --> 36.620 | Loss_mlm --> 37.281 | Grad_l2 --> 161.591 | Weights_l2 --> 7701.789 | Lr --> 0.001 | Seconds_per_step --> 9.657 | [2024-10-20 18:55:39,020][Main][INFO] - [train] Step 175 out of 65536 | Loss --> 72.118 | Loss_ntp --> 35.763 | Loss_mlm --> 36.355 | Grad_l2 --> 161.741 | Weights_l2 --> 7701.783 | Lr --> 0.001 | Seconds_per_step --> 9.623 | [2024-10-20 18:59:40,344][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 70.712 | Loss_ntp --> 35.041 | Loss_mlm --> 35.671 | Grad_l2 --> 154.736 | Weights_l2 --> 7701.778 | Lr --> 0.001 | Seconds_per_step --> 9.653 | [2024-10-20 19:03:39,817][Main][INFO] - [train] Step 225 out of 65536 | Loss --> 69.050 | Loss_ntp --> 34.233 | Loss_mlm --> 34.817 | Grad_l2 --> 106.908 | Weights_l2 --> 7701.772 | Lr --> 0.001 | Seconds_per_step --> 9.579 | [2024-10-20 19:07:41,876][Main][INFO] - [train] Step 250 out of 65536 | Loss --> 68.595 | Loss_ntp --> 33.970 | Loss_mlm --> 34.625 | Grad_l2 --> 126.557 | Weights_l2 --> 7701.767 | Lr --> 0.001 | Seconds_per_step --> 9.682 | [2024-10-20 19:11:43,944][Main][INFO] - [train] Step 275 out of 65536 | Loss --> 67.141 | Loss_ntp --> 33.297 | Loss_mlm --> 33.844 | Grad_l2 --> 114.874 | Weights_l2 --> 7701.762 | Lr --> 0.001 | Seconds_per_step --> 9.683 | [2024-10-20 19:15:43,786][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 65.916 | Loss_ntp --> 32.693 | Loss_mlm --> 33.223 | Grad_l2 --> 89.430 | Weights_l2 --> 7701.757 | Lr --> 0.001 | Seconds_per_step --> 9.594 | [2024-10-20 19:19:45,206][Main][INFO] - [train] Step 325 out of 65536 | Loss --> 65.322 | Loss_ntp --> 32.362 | Loss_mlm --> 32.960 | Grad_l2 --> 97.785 | Weights_l2 --> 7701.751 | Lr --> 0.001 | Seconds_per_step --> 9.657 | [2024-10-20 19:23:45,072][Main][INFO] - [train] Step 350 out of 65536 | Loss --> 64.367 | Loss_ntp --> 31.937 | Loss_mlm --> 32.430 | Grad_l2 --> 83.882 | Weights_l2 --> 7701.746 | Lr --> 0.001 | Seconds_per_step --> 9.595 | [2024-10-20 19:27:46,534][Main][INFO] - [train] Step 375 out of 65536 | Loss --> 63.409 | Loss_ntp --> 31.433 | Loss_mlm --> 31.975 | Grad_l2 --> 75.548 | Weights_l2 --> 7701.741 | Lr --> 0.001 | Seconds_per_step --> 9.658 | [2024-10-20 19:31:45,390][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 62.292 | Loss_ntp --> 30.925 | Loss_mlm --> 31.367 | Grad_l2 --> 72.299 | Weights_l2 --> 7701.736 | Lr --> 0.001 | Seconds_per_step --> 9.554 | [2024-10-20 19:35:46,689][Main][INFO] - [train] Step 425 out of 65536 | Loss --> 61.685 | Loss_ntp --> 30.585 | Loss_mlm --> 31.100 | Grad_l2 --> 73.838 | Weights_l2 --> 7701.731 | Lr --> 0.001 | Seconds_per_step --> 9.652 | [2024-10-20 19:39:46,030][Main][INFO] - [train] Step 450 out of 65536 | Loss --> 61.416 | Loss_ntp --> 30.509 | Loss_mlm --> 30.907 | Grad_l2 --> 79.820 | Weights_l2 --> 7701.726 | Lr --> 0.001 | Seconds_per_step --> 9.573 | [2024-10-20 19:43:47,298][Main][INFO] - [train] Step 475 out of 65536 | Loss --> 60.536 | Loss_ntp --> 30.069 | Loss_mlm --> 30.467 | Grad_l2 --> 59.074 | Weights_l2 --> 7701.722 | Lr --> 0.001 | Seconds_per_step --> 9.651 | [2024-10-20 19:47:48,778][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 60.085 | Loss_ntp --> 29.838 | Loss_mlm --> 30.246 | Grad_l2 --> 71.417 | Weights_l2 --> 7701.717 | Lr --> 0.001 | Seconds_per_step --> 9.659 | [2024-10-20 19:49:25,862][Main][INFO] - [eval] Step 500 out of 65536 | Loss --> 57.611 | Loss_ntp --> 28.694 | Loss_mlm --> 28.917 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 97.080 | [2024-10-20 19:53:26,482][Main][INFO] - [train] Step 525 out of 65536 | Loss --> 59.106 | Loss_ntp --> 29.371 | Loss_mlm --> 29.735 | Grad_l2 --> 56.829 | Weights_l2 --> 7701.712 | Lr --> 0.001 | Seconds_per_step --> 9.625 | [2024-10-20 19:57:25,811][Main][INFO] - [train] Step 550 out of 65536 | Loss --> 58.185 | Loss_ntp --> 28.950 | Loss_mlm --> 29.235 | Grad_l2 --> 56.368 | Weights_l2 --> 7701.707 | Lr --> 0.001 | Seconds_per_step --> 9.573 | [2024-10-20 20:01:26,095][Main][INFO] - [train] Step 575 out of 65536 | Loss --> 57.301 | Loss_ntp --> 28.480 | Loss_mlm --> 28.821 | Grad_l2 --> 39.860 | Weights_l2 --> 7701.703 | Lr --> 0.001 | Seconds_per_step --> 9.611 | [2024-10-20 20:05:26,649][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 56.020 | Loss_ntp --> 27.906 | Loss_mlm --> 28.115 | Grad_l2 --> 35.414 | Weights_l2 --> 7701.698 | Lr --> 0.001 | Seconds_per_step --> 9.622 | [2024-10-20 20:09:28,597][Main][INFO] - [train] Step 625 out of 65536 | Loss --> 55.363 | Loss_ntp --> 27.524 | Loss_mlm --> 27.840 | Grad_l2 --> 50.531 | Weights_l2 --> 7701.694 | Lr --> 0.001 | Seconds_per_step --> 9.678 | [2024-10-20 20:13:29,399][Main][INFO] - [train] Step 650 out of 65536 | Loss --> 54.803 | Loss_ntp --> 27.252 | Loss_mlm --> 27.551 | Grad_l2 --> 56.108 | Weights_l2 --> 7701.689 | Lr --> 0.001 | Seconds_per_step --> 9.632 | [2024-10-20 20:17:31,948][Main][INFO] - [train] Step 675 out of 65536 | Loss --> 53.970 | Loss_ntp --> 26.793 | Loss_mlm --> 27.176 | Grad_l2 --> 46.473 | Weights_l2 --> 7701.685 | Lr --> 0.001 | Seconds_per_step --> 9.702 | [2024-10-20 20:21:31,196][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 53.056 | Loss_ntp --> 26.359 | Loss_mlm --> 26.697 | Grad_l2 --> 37.435 | Weights_l2 --> 7701.680 | Lr --> 0.001 | Seconds_per_step --> 9.570 | [2024-10-20 20:25:33,347][Main][INFO] - [train] Step 725 out of 65536 | Loss --> 52.070 | Loss_ntp --> 25.876 | Loss_mlm --> 26.194 | Grad_l2 --> 43.881 | Weights_l2 --> 7701.676 | Lr --> 0.001 | Seconds_per_step --> 9.686 | [2024-10-20 20:29:33,004][Main][INFO] - [train] Step 750 out of 65536 | Loss --> 51.191 | Loss_ntp --> 25.456 | Loss_mlm --> 25.735 | Grad_l2 --> 44.855 | Weights_l2 --> 7701.672 | Lr --> 0.001 | Seconds_per_step --> 9.586 | [2024-10-20 20:33:34,557][Main][INFO] - [train] Step 775 out of 65536 | Loss --> 50.129 | Loss_ntp --> 24.891 | Loss_mlm --> 25.239 | Grad_l2 --> 40.117 | Weights_l2 --> 7701.667 | Lr --> 0.001 | Seconds_per_step --> 9.662 | [2024-10-20 20:37:33,242][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 49.019 | Loss_ntp --> 24.361 | Loss_mlm --> 24.658 | Grad_l2 --> 39.953 | Weights_l2 --> 7701.663 | Lr --> 0.001 | Seconds_per_step --> 9.547 | [2024-10-20 20:41:33,285][Main][INFO] - [train] Step 825 out of 65536 | Loss --> 48.160 | Loss_ntp --> 23.923 | Loss_mlm --> 24.238 | Grad_l2 --> 42.816 | Weights_l2 --> 7701.659 | Lr --> 0.001 | Seconds_per_step --> 9.602 | [2024-10-20 20:45:34,352][Main][INFO] - [train] Step 850 out of 65536 | Loss --> 46.672 | Loss_ntp --> 23.149 | Loss_mlm --> 23.522 | Grad_l2 --> 42.230 | Weights_l2 --> 7701.654 | Lr --> 0.001 | Seconds_per_step --> 9.643 | [2024-10-20 20:49:34,963][Main][INFO] - [train] Step 875 out of 65536 | Loss --> 44.855 | Loss_ntp --> 22.279 | Loss_mlm --> 22.575 | Grad_l2 --> 39.123 | Weights_l2 --> 7701.650 | Lr --> 0.001 | Seconds_per_step --> 9.624 | [2024-10-20 20:53:36,677][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 42.480 | Loss_ntp --> 21.057 | Loss_mlm --> 21.423 | Grad_l2 --> 50.501 | Weights_l2 --> 7701.645 | Lr --> 0.001 | Seconds_per_step --> 9.668 | [2024-10-20 20:57:37,186][Main][INFO] - [train] Step 925 out of 65536 | Loss --> 40.028 | Loss_ntp --> 19.877 | Loss_mlm --> 20.151 | Grad_l2 --> 57.109 | Weights_l2 --> 7701.640 | Lr --> 0.001 | Seconds_per_step --> 9.620 | [2024-10-20 21:01:38,800][Main][INFO] - [train] Step 950 out of 65536 | Loss --> 37.058 | Loss_ntp --> 18.359 | Loss_mlm --> 18.699 | Grad_l2 --> 78.443 | Weights_l2 --> 7701.634 | Lr --> 0.001 | Seconds_per_step --> 9.664 | [2024-10-20 21:05:38,405][Main][INFO] - [train] Step 975 out of 65536 | Loss --> 33.534 | Loss_ntp --> 16.618 | Loss_mlm --> 16.917 | Grad_l2 --> 87.220 | Weights_l2 --> 7701.628 | Lr --> 0.001 | Seconds_per_step --> 9.584 | [2024-10-20 21:09:41,153][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 29.988 | Loss_ntp --> 14.857 | Loss_mlm --> 15.131 | Grad_l2 --> 88.279 | Weights_l2 --> 7701.622 | Lr --> 0.001 | Seconds_per_step --> 9.710 | [2024-10-20 21:10:10,310][Main][INFO] - [eval] Step 1000 out of 65536 | Loss --> 28.033 | Loss_ntp --> 13.938 | Loss_mlm --> 14.095 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 29.143 | [2024-10-20 21:14:10,580][Main][INFO] - [train] Step 1025 out of 65536 | Loss --> 26.588 | Loss_ntp --> 13.166 | Loss_mlm --> 13.423 | Grad_l2 --> 109.226 | Weights_l2 --> 7701.616 | Lr --> 0.001 | Seconds_per_step --> 9.611 | [2024-10-20 21:18:12,558][Main][INFO] - [train] Step 1050 out of 65536 | Loss --> 23.850 | Loss_ntp --> 11.830 | Loss_mlm --> 12.020 | Grad_l2 --> 98.666 | Weights_l2 --> 7701.610 | Lr --> 0.001 | Seconds_per_step --> 9.679 | [2024-10-20 21:22:11,593][Main][INFO] - [train] Step 1075 out of 65536 | Loss --> 21.589 | Loss_ntp --> 10.697 | Loss_mlm --> 10.892 | Grad_l2 --> 104.858 | Weights_l2 --> 7701.605 | Lr --> 0.001 | Seconds_per_step --> 9.561 | [2024-10-20 21:26:13,779][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 19.443 | Loss_ntp --> 9.626 | Loss_mlm --> 9.817 | Grad_l2 --> 75.473 | Weights_l2 --> 7701.599 | Lr --> 0.001 | Seconds_per_step --> 9.687 | [2024-10-20 21:30:13,762][Main][INFO] - [train] Step 1125 out of 65536 | Loss --> 17.771 | Loss_ntp --> 8.793 | Loss_mlm --> 8.978 | Grad_l2 --> 55.492 | Weights_l2 --> 7701.593 | Lr --> 0.001 | Seconds_per_step --> 9.599 | [2024-10-20 21:34:14,478][Main][INFO] - [train] Step 1150 out of 65536 | Loss --> 17.092 | Loss_ntp --> 8.462 | Loss_mlm --> 8.630 | Grad_l2 --> 72.673 | Weights_l2 --> 7701.587 | Lr --> 0.001 | Seconds_per_step --> 9.629 | [2024-10-20 21:38:14,797][Main][INFO] - [train] Step 1175 out of 65536 | Loss --> 16.731 | Loss_ntp --> 8.294 | Loss_mlm --> 8.437 | Grad_l2 --> 60.718 | Weights_l2 --> 7701.582 | Lr --> 0.001 | Seconds_per_step --> 9.613 | [2024-10-20 21:42:15,467][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 16.522 | Loss_ntp --> 8.188 | Loss_mlm --> 8.334 | Grad_l2 --> 62.414 | Weights_l2 --> 7701.577 | Lr --> 0.001 | Seconds_per_step --> 9.627 | [2024-10-20 21:46:15,957][Main][INFO] - [train] Step 1225 out of 65536 | Loss --> 16.336 | Loss_ntp --> 8.096 | Loss_mlm --> 8.240 | Grad_l2 --> 57.944 | Weights_l2 --> 7701.572 | Lr --> 0.001 | Seconds_per_step --> 9.619 | [2024-10-20 21:50:15,276][Main][INFO] - [train] Step 1250 out of 65536 | Loss --> 16.167 | Loss_ntp --> 8.006 | Loss_mlm --> 8.161 | Grad_l2 --> 42.899 | Weights_l2 --> 7701.567 | Lr --> 0.001 | Seconds_per_step --> 9.573 | [2024-10-20 21:54:18,039][Main][INFO] - [train] Step 1275 out of 65536 | Loss --> 16.183 | Loss_ntp --> 8.017 | Loss_mlm --> 8.166 | Grad_l2 --> 48.492 | Weights_l2 --> 7701.563 | Lr --> 0.001 | Seconds_per_step --> 9.710 | [2024-10-20 21:58:18,396][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 15.988 | Loss_ntp --> 7.926 | Loss_mlm --> 8.063 | Grad_l2 --> 42.852 | Weights_l2 --> 7701.558 | Lr --> 0.001 | Seconds_per_step --> 9.614 | [2024-10-20 22:02:20,263][Main][INFO] - [train] Step 1325 out of 65536 | Loss --> 15.982 | Loss_ntp --> 7.916 | Loss_mlm --> 8.066 | Grad_l2 --> 47.218 | Weights_l2 --> 7701.553 | Lr --> 0.001 | Seconds_per_step --> 9.675 | [2024-10-20 22:06:20,739][Main][INFO] - [train] Step 1350 out of 65536 | Loss --> 15.830 | Loss_ntp --> 7.838 | Loss_mlm --> 7.992 | Grad_l2 --> 28.805 | Weights_l2 --> 7701.549 | Lr --> 0.001 | Seconds_per_step --> 9.619 | [2024-10-20 22:10:23,190][Main][INFO] - [train] Step 1375 out of 65536 | Loss --> 15.806 | Loss_ntp --> 7.839 | Loss_mlm --> 7.967 | Grad_l2 --> 37.388 | Weights_l2 --> 7701.544 | Lr --> 0.001 | Seconds_per_step --> 9.698 | [2024-10-20 22:14:23,525][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 15.775 | Loss_ntp --> 7.813 | Loss_mlm --> 7.962 | Grad_l2 --> 35.380 | Weights_l2 --> 7701.540 | Lr --> 0.001 | Seconds_per_step --> 9.613 | [2024-10-20 22:18:25,080][Main][INFO] - [train] Step 1425 out of 65536 | Loss --> 15.722 | Loss_ntp --> 7.794 | Loss_mlm --> 7.928 | Grad_l2 --> 34.978 | Weights_l2 --> 7701.535 | Lr --> 0.001 | Seconds_per_step --> 9.662 | [2024-10-20 22:22:24,651][Main][INFO] - [train] Step 1450 out of 65536 | Loss --> 15.638 | Loss_ntp --> 7.739 | Loss_mlm --> 7.899 | Grad_l2 --> 24.003 | Weights_l2 --> 7701.530 | Lr --> 0.001 | Seconds_per_step --> 9.583 | [2024-10-20 22:26:24,495][Main][INFO] - [train] Step 1475 out of 65536 | Loss --> 15.682 | Loss_ntp --> 7.768 | Loss_mlm --> 7.913 | Grad_l2 --> 27.599 | Weights_l2 --> 7701.526 | Lr --> 0.001 | Seconds_per_step --> 9.594 | [2024-10-20 22:30:25,992][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 15.638 | Loss_ntp --> 7.754 | Loss_mlm --> 7.884 | Grad_l2 --> 22.985 | Weights_l2 --> 7701.521 | Lr --> 0.001 | Seconds_per_step --> 9.660 | [2024-10-20 22:30:54,697][Main][INFO] - [eval] Step 1500 out of 65536 | Loss --> 15.664 | Loss_ntp --> 7.782 | Loss_mlm --> 7.882 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.700 | [2024-10-20 22:30:54,709][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-1500 [2024-10-20 22:30:54,719][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-10-20 22:30:59,988][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-1500/model.safetensors [2024-10-20 22:31:08,673][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-1500/optimizer.bin [2024-10-20 22:31:08,682][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-1500/scheduler.bin [2024-10-20 22:31:08,684][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-1500/sampler.bin [2024-10-20 22:31:08,686][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-1500/sampler_1.bin [2024-10-20 22:31:08,694][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-1500/random_states_0.pkl [2024-10-20 22:35:09,885][Main][INFO] - [train] Step 1525 out of 65536 | Loss --> 15.740 | Loss_ntp --> 7.803 | Loss_mlm --> 7.937 | Grad_l2 --> 35.476 | Weights_l2 --> 7701.516 | Lr --> 0.001 | Seconds_per_step --> 10.207 | [2024-10-20 22:39:10,189][Main][INFO] - [train] Step 1550 out of 65536 | Loss --> 15.717 | Loss_ntp --> 7.796 | Loss_mlm --> 7.921 | Grad_l2 --> 32.209 | Weights_l2 --> 7701.511 | Lr --> 0.001 | Seconds_per_step --> 9.612 | [2024-10-20 22:43:12,020][Main][INFO] - [train] Step 1575 out of 65536 | Loss --> 15.723 | Loss_ntp --> 7.805 | Loss_mlm --> 7.918 | Grad_l2 --> 35.393 | Weights_l2 --> 7701.506 | Lr --> 0.001 | Seconds_per_step --> 9.673 | [2024-10-20 22:47:13,492][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 15.617 | Loss_ntp --> 7.752 | Loss_mlm --> 7.865 | Grad_l2 --> 29.357 | Weights_l2 --> 7701.502 | Lr --> 0.001 | Seconds_per_step --> 9.659 | [2024-10-20 22:51:13,978][Main][INFO] - [train] Step 1625 out of 65536 | Loss --> 15.532 | Loss_ntp --> 7.709 | Loss_mlm --> 7.822 | Grad_l2 --> 18.501 | Weights_l2 --> 7701.497 | Lr --> 0.001 | Seconds_per_step --> 9.619 | [2024-10-20 22:55:14,600][Main][INFO] - [train] Step 1650 out of 65536 | Loss --> 15.565 | Loss_ntp --> 7.720 | Loss_mlm --> 7.845 | Grad_l2 --> 17.546 | Weights_l2 --> 7701.493 | Lr --> 0.001 | Seconds_per_step --> 9.625 | [2024-10-20 22:59:14,384][Main][INFO] - [train] Step 1675 out of 65536 | Loss --> 15.576 | Loss_ntp --> 7.737 | Loss_mlm --> 7.838 | Grad_l2 --> 23.599 | Weights_l2 --> 7701.489 | Lr --> 0.001 | Seconds_per_step --> 9.591 | [2024-10-20 23:03:16,878][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 15.612 | Loss_ntp --> 7.757 | Loss_mlm --> 7.855 | Grad_l2 --> 28.685 | Weights_l2 --> 7701.484 | Lr --> 0.001 | Seconds_per_step --> 9.700 | [2024-10-20 23:07:16,611][Main][INFO] - [train] Step 1725 out of 65536 | Loss --> 15.590 | Loss_ntp --> 7.728 | Loss_mlm --> 7.861 | Grad_l2 --> 22.357 | Weights_l2 --> 7701.479 | Lr --> 0.001 | Seconds_per_step --> 9.589 | [2024-10-20 23:11:18,435][Main][INFO] - [train] Step 1750 out of 65536 | Loss --> 15.475 | Loss_ntp --> 7.683 | Loss_mlm --> 7.792 | Grad_l2 --> 20.808 | Weights_l2 --> 7701.475 | Lr --> 0.001 | Seconds_per_step --> 9.673 | [2024-10-20 23:15:17,324][Main][INFO] - [train] Step 1775 out of 65536 | Loss --> 15.422 | Loss_ntp --> 7.655 | Loss_mlm --> 7.767 | Grad_l2 --> 16.928 | Weights_l2 --> 7701.470 | Lr --> 0.001 | Seconds_per_step --> 9.555 | [2024-10-20 23:19:17,823][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 15.370 | Loss_ntp --> 7.625 | Loss_mlm --> 7.745 | Grad_l2 --> 16.147 | Weights_l2 --> 7701.466 | Lr --> 0.001 | Seconds_per_step --> 9.620 | [2024-10-20 23:23:19,005][Main][INFO] - [train] Step 1825 out of 65536 | Loss --> 15.363 | Loss_ntp --> 7.629 | Loss_mlm --> 7.734 | Grad_l2 --> 19.934 | Weights_l2 --> 7701.462 | Lr --> 0.001 | Seconds_per_step --> 9.647 | [2024-10-20 23:27:17,933][Main][INFO] - [train] Step 1850 out of 65536 | Loss --> 15.347 | Loss_ntp --> 7.616 | Loss_mlm --> 7.732 | Grad_l2 --> 25.592 | Weights_l2 --> 7701.457 | Lr --> 0.001 | Seconds_per_step --> 9.557 | [2024-10-20 23:31:19,805][Main][INFO] - [train] Step 1875 out of 65536 | Loss --> 15.254 | Loss_ntp --> 7.577 | Loss_mlm --> 7.677 | Grad_l2 --> 19.500 | Weights_l2 --> 7701.453 | Lr --> 0.001 | Seconds_per_step --> 9.675 | [2024-10-20 23:35:18,582][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 15.204 | Loss_ntp --> 7.550 | Loss_mlm --> 7.653 | Grad_l2 --> 15.358 | Weights_l2 --> 7701.448 | Lr --> 0.001 | Seconds_per_step --> 9.551 | [2024-10-20 23:39:20,300][Main][INFO] - [train] Step 1925 out of 65536 | Loss --> 15.153 | Loss_ntp --> 7.525 | Loss_mlm --> 7.628 | Grad_l2 --> 13.241 | Weights_l2 --> 7701.445 | Lr --> 0.001 | Seconds_per_step --> 9.669 | [2024-10-20 23:43:21,680][Main][INFO] - [train] Step 1950 out of 65536 | Loss --> 15.111 | Loss_ntp --> 7.497 | Loss_mlm --> 7.614 | Grad_l2 --> 13.357 | Weights_l2 --> 7701.441 | Lr --> 0.001 | Seconds_per_step --> 9.655 | [2024-10-20 23:47:22,111][Main][INFO] - [train] Step 1975 out of 65536 | Loss --> 15.072 | Loss_ntp --> 7.475 | Loss_mlm --> 7.597 | Grad_l2 --> 15.485 | Weights_l2 --> 7701.437 | Lr --> 0.001 | Seconds_per_step --> 9.617 | [2024-10-20 23:51:21,960][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 15.061 | Loss_ntp --> 7.470 | Loss_mlm --> 7.591 | Grad_l2 --> 15.511 | Weights_l2 --> 7701.432 | Lr --> 0.001 | Seconds_per_step --> 9.594 | [2024-10-20 23:51:50,849][Main][INFO] - [eval] Step 2000 out of 65536 | Loss --> 15.092 | Loss_ntp --> 7.501 | Loss_mlm --> 7.591 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.883 | [2024-10-20 23:55:53,490][Main][INFO] - [train] Step 2025 out of 65536 | Loss --> 15.080 | Loss_ntp --> 7.479 | Loss_mlm --> 7.601 | Grad_l2 --> 17.451 | Weights_l2 --> 7701.428 | Lr --> 0.001 | Seconds_per_step --> 9.705 | [2024-10-20 23:59:53,747][Main][INFO] - [train] Step 2050 out of 65536 | Loss --> 14.998 | Loss_ntp --> 7.447 | Loss_mlm --> 7.551 | Grad_l2 --> 13.242 | Weights_l2 --> 7701.424 | Lr --> 0.001 | Seconds_per_step --> 9.610 | [2024-10-21 00:03:57,114][Main][INFO] - [train] Step 2075 out of 65536 | Loss --> 14.994 | Loss_ntp --> 7.431 | Loss_mlm --> 7.562 | Grad_l2 --> 17.409 | Weights_l2 --> 7701.419 | Lr --> 0.001 | Seconds_per_step --> 9.735 | [2024-10-21 00:07:56,557][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 14.993 | Loss_ntp --> 7.437 | Loss_mlm --> 7.556 | Grad_l2 --> 23.374 | Weights_l2 --> 7701.414 | Lr --> 0.001 | Seconds_per_step --> 9.578 | [2024-10-21 00:11:56,818][Main][INFO] - [train] Step 2125 out of 65536 | Loss --> 14.963 | Loss_ntp --> 7.428 | Loss_mlm --> 7.535 | Grad_l2 --> 24.857 | Weights_l2 --> 7701.410 | Lr --> 0.001 | Seconds_per_step --> 9.610 | [2024-10-21 00:15:56,927][Main][INFO] - [train] Step 2150 out of 65536 | Loss --> 14.829 | Loss_ntp --> 7.354 | Loss_mlm --> 7.474 | Grad_l2 --> 14.538 | Weights_l2 --> 7701.405 | Lr --> 0.001 | Seconds_per_step --> 9.604 | [2024-10-21 00:19:57,089][Main][INFO] - [train] Step 2175 out of 65536 | Loss --> 14.797 | Loss_ntp --> 7.344 | Loss_mlm --> 7.453 | Grad_l2 --> 13.598 | Weights_l2 --> 7701.400 | Lr --> 0.001 | Seconds_per_step --> 9.606 | [2024-10-21 00:23:58,135][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 14.774 | Loss_ntp --> 7.321 | Loss_mlm --> 7.454 | Grad_l2 --> 13.339 | Weights_l2 --> 7701.396 | Lr --> 0.001 | Seconds_per_step --> 9.642 | [2024-10-21 00:27:58,499][Main][INFO] - [train] Step 2225 out of 65536 | Loss --> 14.671 | Loss_ntp --> 7.284 | Loss_mlm --> 7.387 | Grad_l2 --> 13.884 | Weights_l2 --> 7701.392 | Lr --> 0.001 | Seconds_per_step --> 9.614 | [2024-10-21 00:31:59,596][Main][INFO] - [train] Step 2250 out of 65536 | Loss --> 14.635 | Loss_ntp --> 7.264 | Loss_mlm --> 7.371 | Grad_l2 --> 11.527 | Weights_l2 --> 7701.388 | Lr --> 0.001 | Seconds_per_step --> 9.644 | [2024-10-21 00:35:58,256][Main][INFO] - [train] Step 2275 out of 65536 | Loss --> 14.593 | Loss_ntp --> 7.247 | Loss_mlm --> 7.345 | Grad_l2 --> 9.993 | Weights_l2 --> 7701.384 | Lr --> 0.001 | Seconds_per_step --> 9.546 | [2024-10-21 00:39:59,379][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 14.543 | Loss_ntp --> 7.216 | Loss_mlm --> 7.327 | Grad_l2 --> 12.147 | Weights_l2 --> 7701.381 | Lr --> 0.001 | Seconds_per_step --> 9.644 | [2024-10-21 00:43:59,080][Main][INFO] - [train] Step 2325 out of 65536 | Loss --> 14.577 | Loss_ntp --> 7.231 | Loss_mlm --> 7.345 | Grad_l2 --> 12.365 | Weights_l2 --> 7701.376 | Lr --> 0.001 | Seconds_per_step --> 9.588 | [2024-10-21 00:47:59,811][Main][INFO] - [train] Step 2350 out of 65536 | Loss --> 14.512 | Loss_ntp --> 7.202 | Loss_mlm --> 7.310 | Grad_l2 --> 12.472 | Weights_l2 --> 7701.372 | Lr --> 0.001 | Seconds_per_step --> 9.629 | [2024-10-21 00:51:58,749][Main][INFO] - [train] Step 2375 out of 65536 | Loss --> 14.434 | Loss_ntp --> 7.166 | Loss_mlm --> 7.268 | Grad_l2 --> 12.198 | Weights_l2 --> 7701.368 | Lr --> 0.001 | Seconds_per_step --> 9.557 | [2024-10-21 00:55:58,527][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 14.390 | Loss_ntp --> 7.141 | Loss_mlm --> 7.249 | Grad_l2 --> 11.488 | Weights_l2 --> 7701.365 | Lr --> 0.001 | Seconds_per_step --> 9.591 | [2024-10-21 00:59:59,746][Main][INFO] - [train] Step 2425 out of 65536 | Loss --> 14.396 | Loss_ntp --> 7.142 | Loss_mlm --> 7.253 | Grad_l2 --> 11.924 | Weights_l2 --> 7701.361 | Lr --> 0.001 | Seconds_per_step --> 9.649 | [2024-10-21 01:03:58,922][Main][INFO] - [train] Step 2450 out of 65536 | Loss --> 14.319 | Loss_ntp --> 7.108 | Loss_mlm --> 7.211 | Grad_l2 --> 11.587 | Weights_l2 --> 7701.357 | Lr --> 0.001 | Seconds_per_step --> 9.567 | [2024-10-21 01:08:00,577][Main][INFO] - [train] Step 2475 out of 65536 | Loss --> 14.363 | Loss_ntp --> 7.132 | Loss_mlm --> 7.231 | Grad_l2 --> 11.854 | Weights_l2 --> 7701.353 | Lr --> 0.001 | Seconds_per_step --> 9.666 | [2024-10-21 01:12:00,070][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 14.333 | Loss_ntp --> 7.121 | Loss_mlm --> 7.212 | Grad_l2 --> 10.363 | Weights_l2 --> 7701.349 | Lr --> 0.001 | Seconds_per_step --> 9.580 | [2024-10-21 01:12:28,480][Main][INFO] - [eval] Step 2500 out of 65536 | Loss --> 14.573 | Loss_ntp --> 7.286 | Loss_mlm --> 7.287 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.404 | [2024-10-21 01:16:30,064][Main][INFO] - [train] Step 2525 out of 65536 | Loss --> 14.280 | Loss_ntp --> 7.089 | Loss_mlm --> 7.192 | Grad_l2 --> 13.178 | Weights_l2 --> 7701.345 | Lr --> 0.001 | Seconds_per_step --> 9.663 | [2024-10-21 01:20:29,018][Main][INFO] - [train] Step 2550 out of 65536 | Loss --> 14.260 | Loss_ntp --> 7.091 | Loss_mlm --> 7.169 | Grad_l2 --> 12.381 | Weights_l2 --> 7701.341 | Lr --> 0.001 | Seconds_per_step --> 9.558 | [2024-10-21 01:24:31,253][Main][INFO] - [train] Step 2575 out of 65536 | Loss --> 14.259 | Loss_ntp --> 7.078 | Loss_mlm --> 7.182 | Grad_l2 --> 11.247 | Weights_l2 --> 7701.337 | Lr --> 0.001 | Seconds_per_step --> 9.689 | [2024-10-21 01:28:31,446][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 14.259 | Loss_ntp --> 7.080 | Loss_mlm --> 7.179 | Grad_l2 --> 12.524 | Weights_l2 --> 7701.333 | Lr --> 0.001 | Seconds_per_step --> 9.608 | [2024-10-21 01:32:31,794][Main][INFO] - [train] Step 2625 out of 65536 | Loss --> 14.245 | Loss_ntp --> 7.068 | Loss_mlm --> 7.178 | Grad_l2 --> 12.087 | Weights_l2 --> 7701.330 | Lr --> 0.001 | Seconds_per_step --> 9.614 | [2024-10-21 01:36:32,411][Main][INFO] - [train] Step 2650 out of 65536 | Loss --> 14.247 | Loss_ntp --> 7.074 | Loss_mlm --> 7.173 | Grad_l2 --> 11.638 | Weights_l2 --> 7701.326 | Lr --> 0.001 | Seconds_per_step --> 9.625 | [2024-10-21 01:40:33,462][Main][INFO] - [train] Step 2675 out of 65536 | Loss --> 14.274 | Loss_ntp --> 7.086 | Loss_mlm --> 7.189 | Grad_l2 --> 10.415 | Weights_l2 --> 7701.322 | Lr --> 0.001 | Seconds_per_step --> 9.642 | [2024-10-21 01:44:33,254][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 14.276 | Loss_ntp --> 7.097 | Loss_mlm --> 7.179 | Grad_l2 --> 10.830 | Weights_l2 --> 7701.318 | Lr --> 0.001 | Seconds_per_step --> 9.592 | [2024-10-21 01:48:34,104][Main][INFO] - [train] Step 2725 out of 65536 | Loss --> 14.322 | Loss_ntp --> 7.117 | Loss_mlm --> 7.205 | Grad_l2 --> 11.668 | Weights_l2 --> 7701.314 | Lr --> 0.001 | Seconds_per_step --> 9.634 | [2024-10-21 01:52:33,834][Main][INFO] - [train] Step 2750 out of 65536 | Loss --> 14.393 | Loss_ntp --> 7.149 | Loss_mlm --> 7.244 | Grad_l2 --> 10.585 | Weights_l2 --> 7701.310 | Lr --> 0.001 | Seconds_per_step --> 9.589 | [2024-10-21 01:56:33,130][Main][INFO] - [train] Step 2775 out of 65536 | Loss --> 14.326 | Loss_ntp --> 7.124 | Loss_mlm --> 7.202 | Grad_l2 --> 9.862 | Weights_l2 --> 7701.306 | Lr --> 0.001 | Seconds_per_step --> 9.572 | [2024-10-21 02:00:34,375][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 14.354 | Loss_ntp --> 7.134 | Loss_mlm --> 7.220 | Grad_l2 --> 8.484 | Weights_l2 --> 7701.302 | Lr --> 0.001 | Seconds_per_step --> 9.650 | [2024-10-21 02:04:34,763][Main][INFO] - [train] Step 2825 out of 65536 | Loss --> 14.320 | Loss_ntp --> 7.118 | Loss_mlm --> 7.202 | Grad_l2 --> 11.118 | Weights_l2 --> 7701.298 | Lr --> 0.001 | Seconds_per_step --> 9.615 | [2024-10-21 02:08:35,157][Main][INFO] - [train] Step 2850 out of 65536 | Loss --> 14.323 | Loss_ntp --> 7.124 | Loss_mlm --> 7.199 | Grad_l2 --> 10.821 | Weights_l2 --> 7701.294 | Lr --> 0.001 | Seconds_per_step --> 9.616 | [2024-10-21 02:12:34,860][Main][INFO] - [train] Step 2875 out of 65536 | Loss --> 14.348 | Loss_ntp --> 7.129 | Loss_mlm --> 7.219 | Grad_l2 --> 9.481 | Weights_l2 --> 7701.291 | Lr --> 0.001 | Seconds_per_step --> 9.588 | [2024-10-21 02:16:36,448][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 14.413 | Loss_ntp --> 7.163 | Loss_mlm --> 7.250 | Grad_l2 --> 10.586 | Weights_l2 --> 7701.287 | Lr --> 0.001 | Seconds_per_step --> 9.663 | [2024-10-21 02:20:36,563][Main][INFO] - [train] Step 2925 out of 65536 | Loss --> 14.319 | Loss_ntp --> 7.113 | Loss_mlm --> 7.206 | Grad_l2 --> 9.175 | Weights_l2 --> 7701.283 | Lr --> 0.001 | Seconds_per_step --> 9.604 | [2024-10-21 02:24:36,522][Main][INFO] - [train] Step 2950 out of 65536 | Loss --> 14.292 | Loss_ntp --> 7.112 | Loss_mlm --> 7.179 | Grad_l2 --> 10.380 | Weights_l2 --> 7701.279 | Lr --> 0.001 | Seconds_per_step --> 9.598 | [2024-10-21 02:28:36,510][Main][INFO] - [train] Step 2975 out of 65536 | Loss --> 14.202 | Loss_ntp --> 7.068 | Loss_mlm --> 7.134 | Grad_l2 --> 9.622 | Weights_l2 --> 7701.276 | Lr --> 0.001 | Seconds_per_step --> 9.599 | [2024-10-21 02:32:38,120][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 14.214 | Loss_ntp --> 7.066 | Loss_mlm --> 7.147 | Grad_l2 --> 10.228 | Weights_l2 --> 7701.272 | Lr --> 0.001 | Seconds_per_step --> 9.664 | [2024-10-21 02:33:06,984][Main][INFO] - [eval] Step 3000 out of 65536 | Loss --> 14.236 | Loss_ntp --> 7.111 | Loss_mlm --> 7.125 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.858 | [2024-10-21 02:33:06,988][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-3000 [2024-10-21 02:33:07,000][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-10-21 02:33:13,140][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-3000/model.safetensors [2024-10-21 02:33:21,968][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-3000/optimizer.bin [2024-10-21 02:33:21,978][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-3000/scheduler.bin [2024-10-21 02:33:21,979][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-3000/sampler.bin [2024-10-21 02:33:21,981][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-3000/sampler_1.bin [2024-10-21 02:33:21,990][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-3000/random_states_0.pkl [2024-10-21 02:37:21,949][Main][INFO] - [train] Step 3025 out of 65536 | Loss --> 14.180 | Loss_ntp --> 7.041 | Loss_mlm --> 7.138 | Grad_l2 --> 9.928 | Weights_l2 --> 7701.268 | Lr --> 0.001 | Seconds_per_step --> 10.198 | [2024-10-21 02:41:23,436][Main][INFO] - [train] Step 3050 out of 65536 | Loss --> 14.163 | Loss_ntp --> 7.032 | Loss_mlm --> 7.130 | Grad_l2 --> 9.909 | Weights_l2 --> 7701.264 | Lr --> 0.001 | Seconds_per_step --> 9.659 | [2024-10-21 02:45:23,362][Main][INFO] - [train] Step 3075 out of 65536 | Loss --> 14.109 | Loss_ntp --> 7.016 | Loss_mlm --> 7.093 | Grad_l2 --> 10.119 | Weights_l2 --> 7701.260 | Lr --> 0.001 | Seconds_per_step --> 9.597 | [2024-10-21 02:49:23,828][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 14.053 | Loss_ntp --> 6.981 | Loss_mlm --> 7.072 | Grad_l2 --> 8.917 | Weights_l2 --> 7701.256 | Lr --> 0.001 | Seconds_per_step --> 9.619 | [2024-10-21 02:53:26,144][Main][INFO] - [train] Step 3125 out of 65536 | Loss --> 14.045 | Loss_ntp --> 6.975 | Loss_mlm --> 7.069 | Grad_l2 --> 11.184 | Weights_l2 --> 7701.252 | Lr --> 0.001 | Seconds_per_step --> 9.692 | [2024-10-21 02:57:25,035][Main][INFO] - [train] Step 3150 out of 65536 | Loss --> 14.006 | Loss_ntp --> 6.959 | Loss_mlm --> 7.047 | Grad_l2 --> 9.280 | Weights_l2 --> 7701.248 | Lr --> 0.001 | Seconds_per_step --> 9.555 | [2024-10-21 03:01:27,283][Main][INFO] - [train] Step 3175 out of 65536 | Loss --> 13.943 | Loss_ntp --> 6.924 | Loss_mlm --> 7.020 | Grad_l2 --> 8.769 | Weights_l2 --> 7701.245 | Lr --> 0.001 | Seconds_per_step --> 9.690 | [2024-10-21 03:05:27,701][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 13.956 | Loss_ntp --> 6.916 | Loss_mlm --> 7.040 | Grad_l2 --> 8.625 | Weights_l2 --> 7701.241 | Lr --> 0.001 | Seconds_per_step --> 9.617 | [2024-10-21 03:09:28,530][Main][INFO] - [train] Step 3225 out of 65536 | Loss --> 13.916 | Loss_ntp --> 6.906 | Loss_mlm --> 7.010 | Grad_l2 --> 9.378 | Weights_l2 --> 7701.238 | Lr --> 0.001 | Seconds_per_step --> 9.633 | [2024-10-21 03:13:28,937][Main][INFO] - [train] Step 3250 out of 65536 | Loss --> 13.849 | Loss_ntp --> 6.867 | Loss_mlm --> 6.982 | Grad_l2 --> 9.221 | Weights_l2 --> 7701.234 | Lr --> 0.001 | Seconds_per_step --> 9.616 | [2024-10-21 03:17:29,597][Main][INFO] - [train] Step 3275 out of 65536 | Loss --> 13.854 | Loss_ntp --> 6.869 | Loss_mlm --> 6.985 | Grad_l2 --> 8.561 | Weights_l2 --> 7701.230 | Lr --> 0.001 | Seconds_per_step --> 9.626 | [2024-10-21 03:21:30,034][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 13.781 | Loss_ntp --> 6.843 | Loss_mlm --> 6.938 | Grad_l2 --> 8.919 | Weights_l2 --> 7701.226 | Lr --> 0.001 | Seconds_per_step --> 9.617 | [2024-10-21 03:25:29,815][Main][INFO] - [train] Step 3325 out of 65536 | Loss --> 13.766 | Loss_ntp --> 6.836 | Loss_mlm --> 6.930 | Grad_l2 --> 8.129 | Weights_l2 --> 7701.223 | Lr --> 0.001 | Seconds_per_step --> 9.591 | [2024-10-21 03:29:30,344][Main][INFO] - [train] Step 3350 out of 65536 | Loss --> 13.726 | Loss_ntp --> 6.809 | Loss_mlm --> 6.917 | Grad_l2 --> 9.145 | Weights_l2 --> 7701.219 | Lr --> 0.001 | Seconds_per_step --> 9.620 | [2024-10-21 03:33:30,171][Main][INFO] - [train] Step 3375 out of 65536 | Loss --> 13.751 | Loss_ntp --> 6.819 | Loss_mlm --> 6.932 | Grad_l2 --> 11.666 | Weights_l2 --> 7701.215 | Lr --> 0.001 | Seconds_per_step --> 9.593 | [2024-10-21 03:37:32,111][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 13.700 | Loss_ntp --> 6.796 | Loss_mlm --> 6.905 | Grad_l2 --> 8.776 | Weights_l2 --> 7701.211 | Lr --> 0.001 | Seconds_per_step --> 9.677 | [2024-10-21 03:41:31,530][Main][INFO] - [train] Step 3425 out of 65536 | Loss --> 13.641 | Loss_ntp --> 6.774 | Loss_mlm --> 6.868 | Grad_l2 --> 9.206 | Weights_l2 --> 7701.207 | Lr --> 0.001 | Seconds_per_step --> 9.577 | [2024-10-21 03:45:33,625][Main][INFO] - [train] Step 3450 out of 65536 | Loss --> 13.588 | Loss_ntp --> 6.735 | Loss_mlm --> 6.852 | Grad_l2 --> 6.293 | Weights_l2 --> 7701.204 | Lr --> 0.001 | Seconds_per_step --> 9.684 | [2024-10-21 03:49:34,400][Main][INFO] - [train] Step 3475 out of 65536 | Loss --> 13.615 | Loss_ntp --> 6.748 | Loss_mlm --> 6.868 | Grad_l2 --> 9.161 | Weights_l2 --> 7701.201 | Lr --> 0.001 | Seconds_per_step --> 9.631 | [2024-10-21 03:53:35,824][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 13.532 | Loss_ntp --> 6.707 | Loss_mlm --> 6.825 | Grad_l2 --> 9.556 | Weights_l2 --> 7701.197 | Lr --> 0.001 | Seconds_per_step --> 9.657 | [2024-10-21 03:54:04,713][Main][INFO] - [eval] Step 3500 out of 65536 | Loss --> 13.912 | Loss_ntp --> 6.950 | Loss_mlm --> 6.962 | Accuracy_mlm --> 0.000 | Accuracy_ntp --> 0.000 | Accuracy --> 0.000 | Time --> 28.883 | [2024-10-21 03:58:05,620][Main][INFO] - [train] Step 3525 out of 65536 | Loss --> 13.463 | Loss_ntp --> 6.677 | Loss_mlm --> 6.786 | Grad_l2 --> 9.458 | Weights_l2 --> 7701.193 | Lr --> 0.001 | Seconds_per_step --> 9.636 | [2024-10-21 04:02:06,516][Main][INFO] - [train] Step 3550 out of 65536 | Loss --> 13.419 | Loss_ntp --> 6.654 | Loss_mlm --> 6.766 | Grad_l2 --> 9.819 | Weights_l2 --> 7701.188 | Lr --> 0.001 | Seconds_per_step --> 9.636 | [2024-10-21 04:06:07,229][Main][INFO] - [train] Step 3575 out of 65536 | Loss --> 13.362 | Loss_ntp --> 6.626 | Loss_mlm --> 6.736 | Grad_l2 --> 8.944 | Weights_l2 --> 7701.184 | Lr --> 0.001 | Seconds_per_step --> 9.628 | [2024-10-21 04:10:08,761][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 13.401 | Loss_ntp --> 6.628 | Loss_mlm --> 6.773 | Grad_l2 --> 9.904 | Weights_l2 --> 7701.180 | Lr --> 0.001 | Seconds_per_step --> 9.661 | [2024-10-21 04:14:09,815][Main][INFO] - [train] Step 3625 out of 65536 | Loss --> 13.361 | Loss_ntp --> 6.625 | Loss_mlm --> 6.736 | Grad_l2 --> 8.507 | Weights_l2 --> 7701.176 | Lr --> 0.001 | Seconds_per_step --> 9.642 | [2024-10-21 04:18:10,037][Main][INFO] - [train] Step 3650 out of 65536 | Loss --> 13.355 | Loss_ntp --> 6.614 | Loss_mlm --> 6.741 | Grad_l2 --> 9.056 | Weights_l2 --> 7701.172 | Lr --> 0.001 | Seconds_per_step --> 9.609 | [2024-10-21 04:22:10,677][Main][INFO] - [train] Step 3675 out of 65536 | Loss --> 13.306 | Loss_ntp --> 6.586 | Loss_mlm --> 6.720 | Grad_l2 --> 9.057 | Weights_l2 --> 7701.168 | Lr --> 0.001 | Seconds_per_step --> 9.625 | [2024-10-21 04:26:12,857][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 13.325 | Loss_ntp --> 6.596 | Loss_mlm --> 6.729 | Grad_l2 --> 10.732 | Weights_l2 --> 7701.163 | Lr --> 0.001 | Seconds_per_step --> 9.687 | [2024-10-21 04:30:11,816][Main][INFO] - [train] Step 3725 out of 65536 | Loss --> 13.239 | Loss_ntp --> 6.561 | Loss_mlm --> 6.678 | Grad_l2 --> 9.810 | Weights_l2 --> 7701.160 | Lr --> 0.001 | Seconds_per_step --> 9.558 | [2024-10-21 04:34:12,167][Main][INFO] - [train] Step 3750 out of 65536 | Loss --> 13.211 | Loss_ntp --> 6.534 | Loss_mlm --> 6.677 | Grad_l2 --> 10.011 | Weights_l2 --> 7701.156 | Lr --> 0.001 | Seconds_per_step --> 9.614 | [2024-10-21 04:38:14,046][Main][INFO] - [train] Step 3775 out of 65536 | Loss --> 13.214 | Loss_ntp --> 6.537 | Loss_mlm --> 6.678 | Grad_l2 --> 8.939 | Weights_l2 --> 7701.152 | Lr --> 0.001 | Seconds_per_step --> 9.675 | [2024-10-21 04:42:14,454][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 13.148 | Loss_ntp --> 6.508 | Loss_mlm --> 6.640 | Grad_l2 --> 9.513 | Weights_l2 --> 7701.148 | Lr --> 0.001 | Seconds_per_step --> 9.616 | [2024-10-21 04:46:14,554][Main][INFO] - [train] Step 3825 out of 65536 | Loss --> 13.172 | Loss_ntp --> 6.514 | Loss_mlm --> 6.658 | Grad_l2 --> 9.295 | Weights_l2 --> 7701.144 | Lr --> 0.001 | Seconds_per_step --> 9.604 | [2024-10-21 04:50:14,762][Main][INFO] - [train] Step 3850 out of 65536 | Loss --> 13.118 | Loss_ntp --> 6.494 | Loss_mlm --> 6.624 | Grad_l2 --> 7.890 | Weights_l2 --> 7701.140 | Lr --> 0.001 | Seconds_per_step --> 9.608 | [2024-10-21 04:54:16,032][Main][INFO] - [train] Step 3875 out of 65536 | Loss --> 13.179 | Loss_ntp --> 6.521 | Loss_mlm --> 6.657 | Grad_l2 --> 9.901 | Weights_l2 --> 7701.136 | Lr --> 0.001 | Seconds_per_step --> 9.651 | [2024-10-21 04:58:16,128][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 13.259 | Loss_ntp --> 6.571 | Loss_mlm --> 6.687 | Grad_l2 --> 8.910 | Weights_l2 --> 7701.132 | Lr --> 0.001 | Seconds_per_step --> 9.604 |