[2024-07-29 11:48:34,867][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [2024-07-29 11:48:34,870][Main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-07-29 11:48:34,870][Main][INFO] - Working directory is /home/jovyan/nanoT5/logs/2024-07-29/11-48-34- [2024-07-29 11:48:40,527][Main][INFO] - You are using T5 legacy LR Schedule, it's independent from the optim.base_lr [2024-07-29 11:54:51,962][Main][INFO] - [train] Step 100 out of 120000 | Loss --> 106.871 | Grad_l2 --> 173.947 | Weights_l2 --> 13631.930 | Lr --> 0.010 | Seconds_per_step --> 3.632 | [2024-07-29 11:58:14,358][Main][INFO] - [train] Step 200 out of 120000 | Loss --> 91.184 | Grad_l2 --> 105.750 | Weights_l2 --> 13698.571 | Lr --> 0.010 | Seconds_per_step --> 2.024 | [2024-07-29 12:01:33,386][Main][INFO] - [train] Step 300 out of 120000 | Loss --> 75.068 | Grad_l2 --> 77.045 | Weights_l2 --> 13784.877 | Lr --> 0.010 | Seconds_per_step --> 1.990 | [2024-07-29 12:04:52,571][Main][INFO] - [train] Step 400 out of 120000 | Loss --> 34.147 | Grad_l2 --> 36.239 | Weights_l2 --> 13867.902 | Lr --> 0.010 | Seconds_per_step --> 1.992 | [2024-07-29 12:08:18,495][Main][INFO] - [train] Step 500 out of 120000 | Loss --> 8.709 | Grad_l2 --> 8.211 | Weights_l2 --> 13928.035 | Lr --> 0.010 | Seconds_per_step --> 2.059 | [2024-07-29 12:11:37,901][Main][INFO] - [train] Step 600 out of 120000 | Loss --> 7.802 | Grad_l2 --> 5.002 | Weights_l2 --> 13996.458 | Lr --> 0.010 | Seconds_per_step --> 1.994 | [2024-07-29 12:14:58,762][Main][INFO] - [train] Step 700 out of 120000 | Loss --> 7.482 | Grad_l2 --> 3.353 | Weights_l2 --> 14064.556 | Lr --> 0.010 | Seconds_per_step --> 2.009 | [2024-07-29 12:18:23,000][Main][INFO] - [train] Step 800 out of 120000 | Loss --> 7.246 | Grad_l2 --> 2.825 | Weights_l2 --> 14131.687 | Lr --> 0.010 | Seconds_per_step --> 2.042 | [2024-07-29 12:21:44,472][Main][INFO] - [train] Step 900 out of 120000 | Loss --> 7.061 | Grad_l2 --> 2.305 | Weights_l2 --> 14186.704 | Lr --> 0.010 | Seconds_per_step --> 2.015 | [2024-07-29 12:25:04,393][Main][INFO] - [train] Step 1000 out of 120000 | Loss --> 6.909 | Grad_l2 --> 2.135 | Weights_l2 --> 14235.237 | Lr --> 0.010 | Seconds_per_step --> 1.999 | [2024-07-29 12:28:28,256][Main][INFO] - [train] Step 1100 out of 120000 | Loss --> 6.806 | Grad_l2 --> 2.517 | Weights_l2 --> 14278.635 | Lr --> 0.010 | Seconds_per_step --> 2.039 | [2024-07-29 12:31:47,636][Main][INFO] - [train] Step 1200 out of 120000 | Loss --> 6.712 | Grad_l2 --> 2.421 | Weights_l2 --> 14320.878 | Lr --> 0.010 | Seconds_per_step --> 1.994 | [2024-07-29 12:35:10,049][Main][INFO] - [train] Step 1300 out of 120000 | Loss --> 6.654 | Grad_l2 --> 1.869 | Weights_l2 --> 14366.013 | Lr --> 0.010 | Seconds_per_step --> 2.024 | [2024-07-29 12:38:34,751][Main][INFO] - [train] Step 1400 out of 120000 | Loss --> 6.602 | Grad_l2 --> 1.800 | Weights_l2 --> 14410.696 | Lr --> 0.010 | Seconds_per_step --> 2.047 | [2024-07-29 12:41:53,489][Main][INFO] - [train] Step 1500 out of 120000 | Loss --> 6.564 | Grad_l2 --> 2.195 | Weights_l2 --> 14449.452 | Lr --> 0.010 | Seconds_per_step --> 1.987 | [2024-07-29 12:45:17,094][Main][INFO] - [train] Step 1600 out of 120000 | Loss --> 6.516 | Grad_l2 --> 2.107 | Weights_l2 --> 14494.476 | Lr --> 0.010 | Seconds_per_step --> 2.036 | [2024-07-29 12:48:39,636][Main][INFO] - [train] Step 1700 out of 120000 | Loss --> 6.484 | Grad_l2 --> 1.673 | Weights_l2 --> 14540.533 | Lr --> 0.010 | Seconds_per_step --> 2.025 | [2024-07-29 12:52:01,453][Main][INFO] - [train] Step 1800 out of 120000 | Loss --> 6.427 | Grad_l2 --> 1.748 | Weights_l2 --> 14587.513 | Lr --> 0.010 | Seconds_per_step --> 2.018 | [2024-07-29 12:55:25,068][Main][INFO] - [train] Step 1900 out of 120000 | Loss --> 6.403 | Grad_l2 --> 1.648 | Weights_l2 --> 14634.954 | Lr --> 0.010 | Seconds_per_step --> 2.036 | [2024-07-29 12:58:48,176][Main][INFO] - [train] Step 2000 out of 120000 | Loss --> 6.347 | Grad_l2 --> 1.685 | Weights_l2 --> 14683.155 | Lr --> 0.010 | Seconds_per_step --> 2.031 | [2024-07-29 13:02:09,499][Main][INFO] - [train] Step 2100 out of 120000 | Loss --> 6.312 | Grad_l2 --> 6.106 | Weights_l2 --> 14731.816 | Lr --> 0.010 | Seconds_per_step --> 2.013 | [2024-07-29 13:05:33,540][Main][INFO] - [train] Step 2200 out of 120000 | Loss --> 6.275 | Grad_l2 --> 1.663 | Weights_l2 --> 14783.776 | Lr --> 0.010 | Seconds_per_step --> 2.040 | [2024-07-29 13:08:54,507][Main][INFO] - [train] Step 2300 out of 120000 | Loss --> 6.232 | Grad_l2 --> 1.386 | Weights_l2 --> 14837.422 | Lr --> 0.010 | Seconds_per_step --> 2.010 | [2024-07-29 13:12:16,555][Main][INFO] - [train] Step 2400 out of 120000 | Loss --> 6.191 | Grad_l2 --> 1.571 | Weights_l2 --> 14890.574 | Lr --> 0.010 | Seconds_per_step --> 2.020 | [2024-07-29 13:15:39,738][Main][INFO] - [train] Step 2500 out of 120000 | Loss --> 6.172 | Grad_l2 --> 1.476 | Weights_l2 --> 14944.926 | Lr --> 0.010 | Seconds_per_step --> 2.032 | [2024-07-29 13:19:00,542][Main][INFO] - [train] Step 2600 out of 120000 | Loss --> 6.137 | Grad_l2 --> 1.491 | Weights_l2 --> 14998.592 | Lr --> 0.010 | Seconds_per_step --> 2.008 | [2024-07-29 13:22:22,020][Main][INFO] - [train] Step 2700 out of 120000 | Loss --> 6.096 | Grad_l2 --> 1.361 | Weights_l2 --> 15052.842 | Lr --> 0.010 | Seconds_per_step --> 2.015 | [2024-07-29 13:25:44,050][Main][INFO] - [train] Step 2800 out of 120000 | Loss --> 6.073 | Grad_l2 --> 1.290 | Weights_l2 --> 15108.141 | Lr --> 0.010 | Seconds_per_step --> 2.020 | [2024-07-29 13:29:05,652][Main][INFO] - [train] Step 2900 out of 120000 | Loss --> 6.049 | Grad_l2 --> 1.436 | Weights_l2 --> 15164.462 | Lr --> 0.010 | Seconds_per_step --> 2.016 | [2024-07-29 13:32:29,205][Main][INFO] - [train] Step 3000 out of 120000 | Loss --> 6.025 | Grad_l2 --> 1.178 | Weights_l2 --> 15223.058 | Lr --> 0.010 | Seconds_per_step --> 2.036 | [2024-07-29 13:35:48,993][Main][INFO] - [train] Step 3100 out of 120000 | Loss --> 6.005 | Grad_l2 --> 1.652 | Weights_l2 --> 15279.546 | Lr --> 0.010 | Seconds_per_step --> 1.998 | [2024-07-29 13:39:10,579][Main][INFO] - [train] Step 3200 out of 120000 | Loss --> 5.985 | Grad_l2 --> 1.154 | Weights_l2 --> 15337.189 | Lr --> 0.010 | Seconds_per_step --> 2.016 | [2024-07-29 13:42:32,322][Main][INFO] - [train] Step 3300 out of 120000 | Loss --> 5.959 | Grad_l2 --> 2.190 | Weights_l2 --> 15394.636 | Lr --> 0.010 | Seconds_per_step --> 2.017 | [2024-07-29 13:45:54,935][Main][INFO] - [train] Step 3400 out of 120000 | Loss --> 5.940 | Grad_l2 --> 1.083 | Weights_l2 --> 15452.219 | Lr --> 0.010 | Seconds_per_step --> 2.026 | [2024-07-29 13:49:15,641][Main][INFO] - [train] Step 3500 out of 120000 | Loss --> 5.917 | Grad_l2 --> 1.199 | Weights_l2 --> 15507.891 | Lr --> 0.010 | Seconds_per_step --> 2.007 | [2024-07-29 13:52:37,527][Main][INFO] - [train] Step 3600 out of 120000 | Loss --> 5.904 | Grad_l2 --> 0.973 | Weights_l2 --> 15567.875 | Lr --> 0.010 | Seconds_per_step --> 2.018 | [2024-07-29 13:55:59,303][Main][INFO] - [train] Step 3700 out of 120000 | Loss --> 5.853 | Grad_l2 --> 0.917 | Weights_l2 --> 15628.298 | Lr --> 0.010 | Seconds_per_step --> 2.018 | [2024-07-29 13:59:19,746][Main][INFO] - [train] Step 3800 out of 120000 | Loss --> 5.796 | Grad_l2 --> 0.879 | Weights_l2 --> 15689.144 | Lr --> 0.010 | Seconds_per_step --> 2.004 | [2024-07-29 14:02:42,361][Main][INFO] - [train] Step 3900 out of 120000 | Loss --> 5.758 | Grad_l2 --> 0.820 | Weights_l2 --> 15749.812 | Lr --> 0.010 | Seconds_per_step --> 2.026 | [2024-07-29 14:06:04,693][Main][INFO] - [train] Step 4000 out of 120000 | Loss --> 5.688 | Grad_l2 --> 0.846 | Weights_l2 --> 15813.621 | Lr --> 0.010 | Seconds_per_step --> 2.023 | [2024-07-29 14:09:26,539][Main][INFO] - [train] Step 4100 out of 120000 | Loss --> 5.584 | Grad_l2 --> 0.847 | Weights_l2 --> 15880.774 | Lr --> 0.010 | Seconds_per_step --> 2.018 | [2024-07-29 14:12:47,953][Main][INFO] - [train] Step 4200 out of 120000 | Loss --> 5.482 | Grad_l2 --> 0.768 | Weights_l2 --> 15951.018 | Lr --> 0.010 | Seconds_per_step --> 2.014 | [2024-07-29 14:16:09,455][Main][INFO] - [train] Step 4300 out of 120000 | Loss --> 5.374 | Grad_l2 --> 1.294 | Weights_l2 --> 16022.493 | Lr --> 0.010 | Seconds_per_step --> 2.015 | [2024-07-29 14:19:31,865][Main][INFO] - [train] Step 4400 out of 120000 | Loss --> 5.282 | Grad_l2 --> 0.709 | Weights_l2 --> 16095.405 | Lr --> 0.010 | Seconds_per_step --> 2.024 | [2024-07-29 14:22:54,303][Main][INFO] - [train] Step 4500 out of 120000 | Loss --> 5.212 | Grad_l2 --> 1.059 | Weights_l2 --> 16165.005 | Lr --> 0.010 | Seconds_per_step --> 2.024 | [2024-07-29 14:26:15,499][Main][INFO] - [train] Step 4600 out of 120000 | Loss --> 5.118 | Grad_l2 --> 0.695 | Weights_l2 --> 16238.645 | Lr --> 0.010 | Seconds_per_step --> 2.012 | [2024-07-29 14:29:38,652][Main][INFO] - [train] Step 4700 out of 120000 | Loss --> 5.062 | Grad_l2 --> 0.840 | Weights_l2 --> 16309.396 | Lr --> 0.010 | Seconds_per_step --> 2.032 | [2024-07-29 14:33:01,084][Main][INFO] - [train] Step 4800 out of 120000 | Loss --> 4.985 | Grad_l2 --> 0.653 | Weights_l2 --> 16383.090 | Lr --> 0.010 | Seconds_per_step --> 2.024 | [2024-07-29 14:36:21,792][Main][INFO] - [train] Step 4900 out of 120000 | Loss --> 4.902 | Grad_l2 --> 0.604 | Weights_l2 --> 16455.734 | Lr --> 0.010 | Seconds_per_step --> 2.007 | [2024-07-29 14:39:44,455][Main][INFO] - [train] Step 5000 out of 120000 | Loss --> 4.837 | Grad_l2 --> 0.628 | Weights_l2 --> 16527.645 | Lr --> 0.010 | Seconds_per_step --> 2.027 | [2024-07-29 14:50:48,097][Main][INFO] - [eval] Step 5000 out of 120000 | Loss --> 4.814 | Accuracy --> 0.330 | Time --> 663.639 | [2024-07-29 14:54:10,907][Main][INFO] - [train] Step 5100 out of 120000 | Loss --> 4.756 | Grad_l2 --> 0.606 | Weights_l2 --> 16599.660 | Lr --> 0.010 | Seconds_per_step --> 2.028 | [2024-07-29 14:57:33,874][Main][INFO] - [train] Step 5200 out of 120000 | Loss --> 4.693 | Grad_l2 --> 0.695 | Weights_l2 --> 16669.551 | Lr --> 0.010 | Seconds_per_step --> 2.030 | [2024-07-29 15:00:55,518][Main][INFO] - [train] Step 5300 out of 120000 | Loss --> 4.611 | Grad_l2 --> 0.608 | Weights_l2 --> 16741.980 | Lr --> 0.010 | Seconds_per_step --> 2.016 | [2024-07-29 15:04:15,619][Main][INFO] - [train] Step 5400 out of 120000 | Loss --> 4.546 | Grad_l2 --> 0.595 | Weights_l2 --> 16814.572 | Lr --> 0.010 | Seconds_per_step --> 2.001 | [2024-07-29 15:07:39,691][Main][INFO] - [train] Step 5500 out of 120000 | Loss --> 4.455 | Grad_l2 --> 0.599 | Weights_l2 --> 16888.003 | Lr --> 0.010 | Seconds_per_step --> 2.041 | [2024-07-29 15:10:59,659][Main][INFO] - [train] Step 5600 out of 120000 | Loss --> 4.384 | Grad_l2 --> 0.580 | Weights_l2 --> 16960.988 | Lr --> 0.010 | Seconds_per_step --> 2.000 | [2024-07-29 15:14:22,735][Main][INFO] - [train] Step 5700 out of 120000 | Loss --> 4.319 | Grad_l2 --> 0.830 | Weights_l2 --> 17028.580 | Lr --> 0.010 | Seconds_per_step --> 2.031 | [2024-07-29 15:17:44,259][Main][INFO] - [train] Step 5800 out of 120000 | Loss --> 4.243 | Grad_l2 --> 0.646 | Weights_l2 --> 17099.131 | Lr --> 0.010 | Seconds_per_step --> 2.015 | [2024-07-29 15:21:05,158][Main][INFO] - [train] Step 5900 out of 120000 | Loss --> 4.177 | Grad_l2 --> 0.609 | Weights_l2 --> 17171.035 | Lr --> 0.010 | Seconds_per_step --> 2.009 | [2024-07-29 15:24:29,302][Main][INFO] - [train] Step 6000 out of 120000 | Loss --> 4.121 | Grad_l2 --> 0.584 | Weights_l2 --> 17242.706 | Lr --> 0.010 | Seconds_per_step --> 2.041 | [2024-07-29 15:27:51,355][Main][INFO] - [train] Step 6100 out of 120000 | Loss --> 4.057 | Grad_l2 --> 0.588 | Weights_l2 --> 17314.218 | Lr --> 0.010 | Seconds_per_step --> 2.021 | [2024-07-29 15:31:12,634][Main][INFO] - [train] Step 6200 out of 120000 | Loss --> 4.010 | Grad_l2 --> 0.657 | Weights_l2 --> 17383.413 | Lr --> 0.010 | Seconds_per_step --> 2.013 | [2024-07-29 15:34:35,991][Main][INFO] - [train] Step 6300 out of 120000 | Loss --> 3.979 | Grad_l2 --> 0.580 | Weights_l2 --> 17453.800 | Lr --> 0.010 | Seconds_per_step --> 2.034 | [2024-07-29 15:37:58,863][Main][INFO] - [train] Step 6400 out of 120000 | Loss --> 3.923 | Grad_l2 --> 0.595 | Weights_l2 --> 17523.488 | Lr --> 0.010 | Seconds_per_step --> 2.029 | [2024-07-29 15:41:19,057][Main][INFO] - [train] Step 6500 out of 120000 | Loss --> 3.889 | Grad_l2 --> 0.586 | Weights_l2 --> 17592.760 | Lr --> 0.010 | Seconds_per_step --> 2.002 | [2024-07-29 15:44:41,058][Main][INFO] - [train] Step 6600 out of 120000 | Loss --> 3.862 | Grad_l2 --> 0.588 | Weights_l2 --> 17662.478 | Lr --> 0.010 | Seconds_per_step --> 2.020 | [2024-07-29 15:48:07,437][Main][INFO] - [train] Step 6700 out of 120000 | Loss --> 3.811 | Grad_l2 --> 0.586 | Weights_l2 --> 17731.370 | Lr --> 0.010 | Seconds_per_step --> 2.064 | [2024-07-29 15:51:27,782][Main][INFO] - [train] Step 6800 out of 120000 | Loss --> 3.770 | Grad_l2 --> 0.577 | Weights_l2 --> 17801.276 | Lr --> 0.010 | Seconds_per_step --> 2.003 | [2024-07-29 15:54:49,942][Main][INFO] - [train] Step 6900 out of 120000 | Loss --> 3.767 | Grad_l2 --> 0.779 | Weights_l2 --> 17866.853 | Lr --> 0.010 | Seconds_per_step --> 2.022 | [2024-07-29 15:58:13,387][Main][INFO] - [train] Step 7000 out of 120000 | Loss --> 3.741 | Grad_l2 --> 0.660 | Weights_l2 --> 17934.337 | Lr --> 0.010 | Seconds_per_step --> 2.034 | [2024-07-29 16:01:35,953][Main][INFO] - [train] Step 7100 out of 120000 | Loss --> 3.728 | Grad_l2 --> 0.847 | Weights_l2 --> 18001.501 | Lr --> 0.010 | Seconds_per_step --> 2.026 | [2024-07-29 16:04:58,735][Main][INFO] - [train] Step 7200 out of 120000 | Loss --> 3.688 | Grad_l2 --> 0.575 | Weights_l2 --> 18071.673 | Lr --> 0.010 | Seconds_per_step --> 2.028 | [2024-07-29 16:08:22,267][Main][INFO] - [train] Step 7300 out of 120000 | Loss --> 3.672 | Grad_l2 --> 0.580 | Weights_l2 --> 18142.394 | Lr --> 0.010 | Seconds_per_step --> 2.035 | [2024-07-29 16:11:43,947][Main][INFO] - [train] Step 7400 out of 120000 | Loss --> 3.629 | Grad_l2 --> 0.578 | Weights_l2 --> 18212.661 | Lr --> 0.010 | Seconds_per_step --> 2.017 | [2024-07-29 16:15:08,993][Main][INFO] - [train] Step 7500 out of 120000 | Loss --> 3.607 | Grad_l2 --> 0.557 | Weights_l2 --> 18282.972 | Lr --> 0.010 | Seconds_per_step --> 2.050 | [2024-07-29 16:18:30,050][Main][INFO] - [train] Step 7600 out of 120000 | Loss --> 3.591 | Grad_l2 --> 0.563 | Weights_l2 --> 18354.529 | Lr --> 0.010 | Seconds_per_step --> 2.011 | [2024-07-29 16:21:51,748][Main][INFO] - [train] Step 7700 out of 120000 | Loss --> 3.582 | Grad_l2 --> 0.710 | Weights_l2 --> 18421.313 | Lr --> 0.010 | Seconds_per_step --> 2.017 | [2024-07-29 16:25:14,749][Main][INFO] - [train] Step 7800 out of 120000 | Loss --> 3.548 | Grad_l2 --> 0.575 | Weights_l2 --> 18491.211 | Lr --> 0.010 | Seconds_per_step --> 2.030 | [2024-07-29 16:28:37,366][Main][INFO] - [train] Step 7900 out of 120000 | Loss --> 3.520 | Grad_l2 --> 0.552 | Weights_l2 --> 18561.609 | Lr --> 0.010 | Seconds_per_step --> 2.026 | [2024-07-29 16:31:59,435][Main][INFO] - [train] Step 8000 out of 120000 | Loss --> 3.518 | Grad_l2 --> 0.549 | Weights_l2 --> 18632.247 | Lr --> 0.010 | Seconds_per_step --> 2.021 | [2024-07-29 16:35:23,175][Main][INFO] - [train] Step 8100 out of 120000 | Loss --> 3.485 | Grad_l2 --> 0.564 | Weights_l2 --> 18703.082 | Lr --> 0.010 | Seconds_per_step --> 2.037 | [2024-07-29 16:38:45,752][Main][INFO] - [train] Step 8200 out of 120000 | Loss --> 3.492 | Grad_l2 --> 0.524 | Weights_l2 --> 18772.997 | Lr --> 0.010 | Seconds_per_step --> 2.026 | [2024-07-29 16:42:20,467][Main][INFO] - [train] Step 8300 out of 120000 | Loss --> 3.465 | Grad_l2 --> 0.578 | Weights_l2 --> 18844.837 | Lr --> 0.010 | Seconds_per_step --> 2.147 | [2024-07-29 16:45:41,474][Main][INFO] - [train] Step 8400 out of 120000 | Loss --> 3.405 | Grad_l2 --> 0.645 | Weights_l2 --> 18912.440 | Lr --> 0.010 | Seconds_per_step --> 2.010 | [2024-07-29 16:49:01,677][Main][INFO] - [train] Step 8500 out of 120000 | Loss --> 3.328 | Grad_l2 --> 0.545 | Weights_l2 --> 18981.533 | Lr --> 0.010 | Seconds_per_step --> 2.002 | [2024-07-29 16:52:24,445][Main][INFO] - [train] Step 8600 out of 120000 | Loss --> 3.270 | Grad_l2 --> 0.523 | Weights_l2 --> 19050.880 | Lr --> 0.010 | Seconds_per_step --> 2.028 | [2024-07-29 16:55:44,071][Main][INFO] - [train] Step 8700 out of 120000 | Loss --> 3.243 | Grad_l2 --> 0.527 | Weights_l2 --> 19120.923 | Lr --> 0.010 | Seconds_per_step --> 1.996 | [2024-07-29 16:59:06,620][Main][INFO] - [train] Step 8800 out of 120000 | Loss --> 3.184 | Grad_l2 --> 0.500 | Weights_l2 --> 19190.488 | Lr --> 0.010 | Seconds_per_step --> 2.025 | [2024-07-29 17:02:28,059][Main][INFO] - [train] Step 8900 out of 120000 | Loss --> 3.156 | Grad_l2 --> 0.609 | Weights_l2 --> 19257.244 | Lr --> 0.010 | Seconds_per_step --> 2.014 | [2024-07-29 17:05:48,753][Main][INFO] - [train] Step 9000 out of 120000 | Loss --> 3.102 | Grad_l2 --> 0.477 | Weights_l2 --> 19326.897 | Lr --> 0.010 | Seconds_per_step --> 2.007 | [2024-07-29 17:09:11,440][Main][INFO] - [train] Step 9100 out of 120000 | Loss --> 3.082 | Grad_l2 --> 0.501 | Weights_l2 --> 19396.657 | Lr --> 0.010 | Seconds_per_step --> 2.027 | [2024-07-29 17:12:35,235][Main][INFO] - [train] Step 9200 out of 120000 | Loss --> 3.084 | Grad_l2 --> 0.503 | Weights_l2 --> 19466.936 | Lr --> 0.010 | Seconds_per_step --> 2.038 | [2024-07-29 17:15:58,250][Main][INFO] - [train] Step 9300 out of 120000 | Loss --> 3.082 | Grad_l2 --> 0.499 | Weights_l2 --> 19537.292 | Lr --> 0.010 | Seconds_per_step --> 2.030 | [2024-07-29 17:19:19,754][Main][INFO] - [train] Step 9400 out of 120000 | Loss --> 3.081 | Grad_l2 --> 0.505 | Weights_l2 --> 19608.242 | Lr --> 0.010 | Seconds_per_step --> 2.015 | [2024-07-29 17:22:42,483][Main][INFO] - [train] Step 9500 out of 120000 | Loss --> 3.046 | Grad_l2 --> 0.499 | Weights_l2 --> 19679.717 | Lr --> 0.010 | Seconds_per_step --> 2.027 | [2024-07-29 17:26:06,770][Main][INFO] - [train] Step 9600 out of 120000 | Loss --> 3.079 | Grad_l2 --> 0.474 | Weights_l2 --> 19752.634 | Lr --> 0.010 | Seconds_per_step --> 2.043 | [2024-07-29 17:29:30,236][Main][INFO] - [train] Step 9700 out of 120000 | Loss --> 3.116 | Grad_l2 --> 0.647 | Weights_l2 --> 19822.169 | Lr --> 0.010 | Seconds_per_step --> 2.035 | [2024-07-29 17:32:53,385][Main][INFO] - [train] Step 9800 out of 120000 | Loss --> 3.094 | Grad_l2 --> 0.502 | Weights_l2 --> 19894.935 | Lr --> 0.010 | Seconds_per_step --> 2.031 | [2024-07-29 17:36:20,072][Main][INFO] - [train] Step 9900 out of 120000 | Loss --> 3.120 | Grad_l2 --> 0.505 | Weights_l2 --> 19968.236 | Lr --> 0.010 | Seconds_per_step --> 2.067 | [2024-07-29 17:39:40,264][Main][INFO] - [train] Step 10000 out of 120000 | Loss --> 3.135 | Grad_l2 --> 0.512 | Weights_l2 --> 20041.810 | Lr --> 0.010 | Seconds_per_step --> 2.002 | [2024-07-29 17:50:42,218][Main][INFO] - [eval] Step 10000 out of 120000 | Loss --> 3.342 | Accuracy --> 0.482 | Time --> 661.952 | [2024-07-29 17:50:42,221][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000 [2024-07-29 17:50:42,225][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-29 17:50:45,606][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors [2024-07-29 17:50:45,657][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin [2024-07-29 17:50:45,658][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin [2024-07-29 17:50:45,658][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin [2024-07-29 17:50:45,658][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin [2024-07-29 17:50:45,659][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl [2024-07-29 17:54:07,797][Main][INFO] - [train] Step 10100 out of 120000 | Loss --> 3.138 | Grad_l2 --> 0.498 | Weights_l2 --> 20115.835 | Lr --> 0.010 | Seconds_per_step --> 2.056 | [2024-07-29 17:57:29,640][Main][INFO] - [train] Step 10200 out of 120000 | Loss --> 3.134 | Grad_l2 --> 0.507 | Weights_l2 --> 20188.102 | Lr --> 0.010 | Seconds_per_step --> 2.018 | [2024-07-29 18:00:51,946][Main][INFO] - [train] Step 10300 out of 120000 | Loss --> 3.148 | Grad_l2 --> 0.505 | Weights_l2 --> 20260.823 | Lr --> 0.010 | Seconds_per_step --> 2.023 | [2024-07-29 18:04:13,640][Main][INFO] - [train] Step 10400 out of 120000 | Loss --> 3.148 | Grad_l2 --> 0.546 | Weights_l2 --> 20331.807 | Lr --> 0.010 | Seconds_per_step --> 2.017 | [2024-07-29 18:07:36,491][Main][INFO] - [train] Step 10500 out of 120000 | Loss --> 3.166 | Grad_l2 --> 0.747 | Weights_l2 --> 20397.445 | Lr --> 0.010 | Seconds_per_step --> 2.029 | [2024-07-29 18:10:57,010][Main][INFO] - [train] Step 10600 out of 120000 | Loss --> 3.149 | Grad_l2 --> 0.497 | Weights_l2 --> 20467.514 | Lr --> 0.010 | Seconds_per_step --> 2.005 | [2024-07-29 18:14:17,835][Main][INFO] - [train] Step 10700 out of 120000 | Loss --> 3.153 | Grad_l2 --> 0.494 | Weights_l2 --> 20538.475 | Lr --> 0.010 | Seconds_per_step --> 2.008 | [2024-07-29 18:17:41,855][Main][INFO] - [train] Step 10800 out of 120000 | Loss --> 3.140 | Grad_l2 --> 0.489 | Weights_l2 --> 20608.332 | Lr --> 0.010 | Seconds_per_step --> 2.040 | [2024-07-29 18:21:06,172][Main][INFO] - [train] Step 10900 out of 120000 | Loss --> 3.121 | Grad_l2 --> 0.465 | Weights_l2 --> 20677.903 | Lr --> 0.010 | Seconds_per_step --> 2.043 | [2024-07-29 18:24:24,051][Main][INFO] - [train] Step 11000 out of 120000 | Loss --> 3.130 | Grad_l2 --> 0.488 | Weights_l2 --> 20746.994 | Lr --> 0.010 | Seconds_per_step --> 1.979 | [2024-07-29 18:27:48,882][Main][INFO] - [train] Step 11100 out of 120000 | Loss --> 3.101 | Grad_l2 --> 0.481 | Weights_l2 --> 20815.580 | Lr --> 0.009 | Seconds_per_step --> 2.048 | [2024-07-29 18:31:11,505][Main][INFO] - [train] Step 11200 out of 120000 | Loss --> 3.106 | Grad_l2 --> 0.485 | Weights_l2 --> 20883.517 | Lr --> 0.009 | Seconds_per_step --> 2.026 | [2024-07-29 18:34:33,665][Main][INFO] - [train] Step 11300 out of 120000 | Loss --> 3.105 | Grad_l2 --> 0.476 | Weights_l2 --> 20951.672 | Lr --> 0.009 | Seconds_per_step --> 2.022 | [2024-07-29 18:37:57,436][Main][INFO] - [train] Step 11400 out of 120000 | Loss --> 3.109 | Grad_l2 --> 0.475 | Weights_l2 --> 21019.001 | Lr --> 0.009 | Seconds_per_step --> 2.038 | [2024-07-29 18:41:19,284][Main][INFO] - [train] Step 11500 out of 120000 | Loss --> 3.098 | Grad_l2 --> 0.469 | Weights_l2 --> 21086.577 | Lr --> 0.009 | Seconds_per_step --> 2.018 | [2024-07-29 18:44:43,202][Main][INFO] - [train] Step 11600 out of 120000 | Loss --> 3.087 | Grad_l2 --> 0.478 | Weights_l2 --> 21153.853 | Lr --> 0.009 | Seconds_per_step --> 2.039 | [2024-07-29 18:48:03,843][Main][INFO] - [train] Step 11700 out of 120000 | Loss --> 3.078 | Grad_l2 --> 0.463 | Weights_l2 --> 21220.061 | Lr --> 0.009 | Seconds_per_step --> 2.006 | [2024-07-29 18:51:25,720][Main][INFO] - [train] Step 11800 out of 120000 | Loss --> 3.057 | Grad_l2 --> 0.476 | Weights_l2 --> 21287.244 | Lr --> 0.009 | Seconds_per_step --> 2.019 | [2024-07-29 18:54:48,037][Main][INFO] - [train] Step 11900 out of 120000 | Loss --> 3.047 | Grad_l2 --> 0.464 | Weights_l2 --> 21353.821 | Lr --> 0.009 | Seconds_per_step --> 2.023 | [2024-07-29 18:58:09,004][Main][INFO] - [train] Step 12000 out of 120000 | Loss --> 3.046 | Grad_l2 --> 0.468 | Weights_l2 --> 21419.562 | Lr --> 0.009 | Seconds_per_step --> 2.010 | [2024-07-29 19:01:31,387][Main][INFO] - [train] Step 12100 out of 120000 | Loss --> 3.018 | Grad_l2 --> 0.454 | Weights_l2 --> 21485.491 | Lr --> 0.009 | Seconds_per_step --> 2.024 | [2024-07-29 19:04:55,037][Main][INFO] - [train] Step 12200 out of 120000 | Loss --> 3.003 | Grad_l2 --> 0.466 | Weights_l2 --> 21551.248 | Lr --> 0.009 | Seconds_per_step --> 2.036 | [2024-07-29 19:08:17,180][Main][INFO] - [train] Step 12300 out of 120000 | Loss --> 2.984 | Grad_l2 --> 0.453 | Weights_l2 --> 21617.303 | Lr --> 0.009 | Seconds_per_step --> 2.021 | [2024-07-29 19:11:37,604][Main][INFO] - [train] Step 12400 out of 120000 | Loss --> 2.985 | Grad_l2 --> 0.470 | Weights_l2 --> 21682.415 | Lr --> 0.009 | Seconds_per_step --> 2.004 | [2024-07-29 19:14:59,236][Main][INFO] - [train] Step 12500 out of 120000 | Loss --> 2.970 | Grad_l2 --> 0.467 | Weights_l2 --> 21748.052 | Lr --> 0.009 | Seconds_per_step --> 2.016 | [2024-07-29 19:18:23,954][Main][INFO] - [train] Step 12600 out of 120000 | Loss --> 2.956 | Grad_l2 --> 0.450 | Weights_l2 --> 21812.841 | Lr --> 0.009 | Seconds_per_step --> 2.047 | [2024-07-29 19:21:48,239][Main][INFO] - [train] Step 12700 out of 120000 | Loss --> 2.947 | Grad_l2 --> 0.456 | Weights_l2 --> 21878.304 | Lr --> 0.009 | Seconds_per_step --> 2.043 | [2024-07-29 19:25:08,082][Main][INFO] - [train] Step 12800 out of 120000 | Loss --> 2.955 | Grad_l2 --> 0.454 | Weights_l2 --> 21942.926 | Lr --> 0.009 | Seconds_per_step --> 1.998 | [2024-07-29 19:28:33,963][Main][INFO] - [train] Step 12900 out of 120000 | Loss --> 2.943 | Grad_l2 --> 0.458 | Weights_l2 --> 22007.463 | Lr --> 0.009 | Seconds_per_step --> 2.059 | [2024-07-29 19:31:55,181][Main][INFO] - [train] Step 13000 out of 120000 | Loss --> 2.938 | Grad_l2 --> 0.462 | Weights_l2 --> 22072.197 | Lr --> 0.009 | Seconds_per_step --> 2.012 | [2024-07-29 19:35:21,007][Main][INFO] - [train] Step 13100 out of 120000 | Loss --> 2.922 | Grad_l2 --> 0.448 | Weights_l2 --> 22137.451 | Lr --> 0.009 | Seconds_per_step --> 2.058 | [2024-07-29 19:38:39,864][Main][INFO] - [train] Step 13200 out of 120000 | Loss --> 2.921 | Grad_l2 --> 0.454 | Weights_l2 --> 22201.766 | Lr --> 0.009 | Seconds_per_step --> 1.989 | [2024-07-29 19:41:59,696][Main][INFO] - [train] Step 13300 out of 120000 | Loss --> 2.911 | Grad_l2 --> 0.456 | Weights_l2 --> 22265.739 | Lr --> 0.009 | Seconds_per_step --> 1.998 | [2024-07-29 19:45:26,363][Main][INFO] - [train] Step 13400 out of 120000 | Loss --> 2.923 | Grad_l2 --> 0.701 | Weights_l2 --> 22325.559 | Lr --> 0.009 | Seconds_per_step --> 2.067 | [2024-07-29 19:48:46,792][Main][INFO] - [train] Step 13500 out of 120000 | Loss --> 2.900 | Grad_l2 --> 0.475 | Weights_l2 --> 22388.151 | Lr --> 0.009 | Seconds_per_step --> 2.004 | [2024-07-29 19:52:08,396][Main][INFO] - [train] Step 13600 out of 120000 | Loss --> 2.891 | Grad_l2 --> 0.453 | Weights_l2 --> 22451.013 | Lr --> 0.009 | Seconds_per_step --> 2.016 | [2024-07-29 19:55:35,568][Main][INFO] - [train] Step 13700 out of 120000 | Loss --> 2.881 | Grad_l2 --> 0.437 | Weights_l2 --> 22513.589 | Lr --> 0.009 | Seconds_per_step --> 2.072 | [2024-07-29 19:58:55,974][Main][INFO] - [train] Step 13800 out of 120000 | Loss --> 2.879 | Grad_l2 --> 0.448 | Weights_l2 --> 22576.038 | Lr --> 0.009 | Seconds_per_step --> 2.004 | [2024-07-29 20:02:18,850][Main][INFO] - [train] Step 13900 out of 120000 | Loss --> 2.876 | Grad_l2 --> 0.434 | Weights_l2 --> 22638.331 | Lr --> 0.008 | Seconds_per_step --> 2.029 | [2024-07-29 20:05:42,787][Main][INFO] - [train] Step 14000 out of 120000 | Loss --> 2.879 | Grad_l2 --> 0.463 | Weights_l2 --> 22699.985 | Lr --> 0.008 | Seconds_per_step --> 2.039 | [2024-07-29 20:09:05,658][Main][INFO] - [train] Step 14100 out of 120000 | Loss --> 2.896 | Grad_l2 --> 0.444 | Weights_l2 --> 22761.406 | Lr --> 0.008 | Seconds_per_step --> 2.029 | [2024-07-29 20:12:28,538][Main][INFO] - [train] Step 14200 out of 120000 | Loss --> 2.870 | Grad_l2 --> 0.444 | Weights_l2 --> 22823.234 | Lr --> 0.008 | Seconds_per_step --> 2.029 | [2024-07-29 20:15:53,551][Main][INFO] - [train] Step 14300 out of 120000 | Loss --> 2.862 | Grad_l2 --> 0.447 | Weights_l2 --> 22885.089 | Lr --> 0.008 | Seconds_per_step --> 2.050 | [2024-07-29 20:19:14,852][Main][INFO] - [train] Step 14400 out of 120000 | Loss --> 2.854 | Grad_l2 --> 0.435 | Weights_l2 --> 22946.433 | Lr --> 0.008 | Seconds_per_step --> 2.013 | [2024-07-29 20:22:37,804][Main][INFO] - [train] Step 14500 out of 120000 | Loss --> 2.852 | Grad_l2 --> 0.436 | Weights_l2 --> 23007.114 | Lr --> 0.008 | Seconds_per_step --> 2.030 | [2024-07-29 20:26:01,646][Main][INFO] - [train] Step 14600 out of 120000 | Loss --> 2.836 | Grad_l2 --> 0.436 | Weights_l2 --> 23067.876 | Lr --> 0.008 | Seconds_per_step --> 2.038 | [2024-07-29 20:29:24,985][Main][INFO] - [train] Step 14700 out of 120000 | Loss --> 2.816 | Grad_l2 --> 0.450 | Weights_l2 --> 23128.417 | Lr --> 0.008 | Seconds_per_step --> 2.033 | [2024-07-29 20:32:49,147][Main][INFO] - [train] Step 14800 out of 120000 | Loss --> 2.824 | Grad_l2 --> 0.436 | Weights_l2 --> 23187.896 | Lr --> 0.008 | Seconds_per_step --> 2.042 | [2024-07-29 20:36:09,982][Main][INFO] - [train] Step 14900 out of 120000 | Loss --> 2.812 | Grad_l2 --> 0.436 | Weights_l2 --> 23248.141 | Lr --> 0.008 | Seconds_per_step --> 2.008 | [2024-07-29 20:39:34,987][Main][INFO] - [train] Step 15000 out of 120000 | Loss --> 2.812 | Grad_l2 --> 0.433 | Weights_l2 --> 23309.018 | Lr --> 0.008 | Seconds_per_step --> 2.050 | [2024-07-29 20:50:38,822][Main][INFO] - [eval] Step 15000 out of 120000 | Loss --> 2.922 | Accuracy --> 0.523 | Time --> 663.832 | [2024-07-29 20:54:01,336][Main][INFO] - [train] Step 15100 out of 120000 | Loss --> 2.821 | Grad_l2 --> 0.436 | Weights_l2 --> 23369.238 | Lr --> 0.008 | Seconds_per_step --> 2.025 | [2024-07-29 20:57:24,055][Main][INFO] - [train] Step 15200 out of 120000 | Loss --> 2.807 | Grad_l2 --> 0.449 | Weights_l2 --> 23428.564 | Lr --> 0.008 | Seconds_per_step --> 2.027 | [2024-07-29 21:00:48,992][Main][INFO] - [train] Step 15300 out of 120000 | Loss --> 2.806 | Grad_l2 --> 0.436 | Weights_l2 --> 23488.810 | Lr --> 0.008 | Seconds_per_step --> 2.049 | [2024-07-29 21:04:10,126][Main][INFO] - [train] Step 15400 out of 120000 | Loss --> 2.801 | Grad_l2 --> 0.432 | Weights_l2 --> 23548.095 | Lr --> 0.008 | Seconds_per_step --> 2.011 | [2024-07-29 21:07:35,636][Main][INFO] - [train] Step 15500 out of 120000 | Loss --> 2.785 | Grad_l2 --> 0.437 | Weights_l2 --> 23607.635 | Lr --> 0.008 | Seconds_per_step --> 2.055 | [2024-07-29 21:10:55,539][Main][INFO] - [train] Step 15600 out of 120000 | Loss --> 2.788 | Grad_l2 --> 0.438 | Weights_l2 --> 23667.033 | Lr --> 0.008 | Seconds_per_step --> 1.999 | [2024-07-29 21:14:16,666][Main][INFO] - [train] Step 15700 out of 120000 | Loss --> 2.789 | Grad_l2 --> 0.430 | Weights_l2 --> 23726.103 | Lr --> 0.008 | Seconds_per_step --> 2.011 | [2024-07-29 21:17:40,553][Main][INFO] - [train] Step 15800 out of 120000 | Loss --> 2.786 | Grad_l2 --> 0.441 | Weights_l2 --> 23784.986 | Lr --> 0.008 | Seconds_per_step --> 2.039 | [2024-07-29 21:21:03,874][Main][INFO] - [train] Step 15900 out of 120000 | Loss --> 2.779 | Grad_l2 --> 0.433 | Weights_l2 --> 23843.450 | Lr --> 0.008 | Seconds_per_step --> 2.033 | [2024-07-29 21:24:26,693][Main][INFO] - [train] Step 16000 out of 120000 | Loss --> 2.800 | Grad_l2 --> 0.444 | Weights_l2 --> 23901.252 | Lr --> 0.008 | Seconds_per_step --> 2.028 | [2024-07-29 21:27:51,557][Main][INFO] - [train] Step 16100 out of 120000 | Loss --> 2.801 | Grad_l2 --> 0.434 | Weights_l2 --> 23959.393 | Lr --> 0.008 | Seconds_per_step --> 2.049 | [2024-07-29 21:31:13,663][Main][INFO] - [train] Step 16200 out of 120000 | Loss --> 2.798 | Grad_l2 --> 0.436 | Weights_l2 --> 24017.172 | Lr --> 0.008 | Seconds_per_step --> 2.021 | [2024-07-29 21:34:34,085][Main][INFO] - [train] Step 16300 out of 120000 | Loss --> 2.794 | Grad_l2 --> 0.444 | Weights_l2 --> 24075.057 | Lr --> 0.008 | Seconds_per_step --> 2.004 | [2024-07-29 21:37:58,692][Main][INFO] - [train] Step 16400 out of 120000 | Loss --> 2.799 | Grad_l2 --> 0.434 | Weights_l2 --> 24132.532 | Lr --> 0.008 | Seconds_per_step --> 2.046 | [2024-07-29 21:41:20,158][Main][INFO] - [train] Step 16500 out of 120000 | Loss --> 2.805 | Grad_l2 --> 0.435 | Weights_l2 --> 24189.470 | Lr --> 0.008 | Seconds_per_step --> 2.015 | [2024-07-29 21:44:41,871][Main][INFO] - [train] Step 16600 out of 120000 | Loss --> 2.780 | Grad_l2 --> 0.433 | Weights_l2 --> 24246.225 | Lr --> 0.008 | Seconds_per_step --> 2.017 | [2024-07-29 21:48:06,322][Main][INFO] - [train] Step 16700 out of 120000 | Loss --> 2.788 | Grad_l2 --> 0.435 | Weights_l2 --> 24303.620 | Lr --> 0.008 | Seconds_per_step --> 2.045 | [2024-07-29 21:51:28,693][Main][INFO] - [train] Step 16800 out of 120000 | Loss --> 2.777 | Grad_l2 --> 0.433 | Weights_l2 --> 24359.759 | Lr --> 0.008 | Seconds_per_step --> 2.024 | [2024-07-29 21:54:51,556][Main][INFO] - [train] Step 16900 out of 120000 | Loss --> 2.783 | Grad_l2 --> 0.443 | Weights_l2 --> 24416.672 | Lr --> 0.008 | Seconds_per_step --> 2.029 | [2024-07-29 21:58:16,082][Main][INFO] - [train] Step 17000 out of 120000 | Loss --> 2.766 | Grad_l2 --> 0.429 | Weights_l2 --> 24472.967 | Lr --> 0.008 | Seconds_per_step --> 2.045 | [2024-07-29 22:01:38,555][Main][INFO] - [train] Step 17100 out of 120000 | Loss --> 2.774 | Grad_l2 --> 0.437 | Weights_l2 --> 24529.218 | Lr --> 0.008 | Seconds_per_step --> 2.025 | [2024-07-29 22:05:00,387][Main][INFO] - [train] Step 17200 out of 120000 | Loss --> 2.783 | Grad_l2 --> 0.434 | Weights_l2 --> 24585.069 | Lr --> 0.008 | Seconds_per_step --> 2.018 | [2024-07-29 22:08:24,580][Main][INFO] - [train] Step 17300 out of 120000 | Loss --> 2.780 | Grad_l2 --> 0.427 | Weights_l2 --> 24640.512 | Lr --> 0.008 | Seconds_per_step --> 2.042 | [2024-07-29 22:11:47,138][Main][INFO] - [train] Step 17400 out of 120000 | Loss --> 2.777 | Grad_l2 --> 0.427 | Weights_l2 --> 24695.756 | Lr --> 0.008 | Seconds_per_step --> 2.026 | [2024-07-29 22:15:07,652][Main][INFO] - [train] Step 17500 out of 120000 | Loss --> 2.779 | Grad_l2 --> 0.430 | Weights_l2 --> 24750.966 | Lr --> 0.008 | Seconds_per_step --> 2.005 | [2024-07-29 22:18:31,461][Main][INFO] - [train] Step 17600 out of 120000 | Loss --> 2.774 | Grad_l2 --> 0.420 | Weights_l2 --> 24806.104 | Lr --> 0.008 | Seconds_per_step --> 2.038 | [2024-07-29 22:21:54,206][Main][INFO] - [train] Step 17700 out of 120000 | Loss --> 2.765 | Grad_l2 --> 0.426 | Weights_l2 --> 24860.793 | Lr --> 0.008 | Seconds_per_step --> 2.027 | [2024-07-29 22:25:14,864][Main][INFO] - [train] Step 17800 out of 120000 | Loss --> 2.767 | Grad_l2 --> 0.430 | Weights_l2 --> 24915.147 | Lr --> 0.007 | Seconds_per_step --> 2.007 | [2024-07-29 22:28:39,288][Main][INFO] - [train] Step 17900 out of 120000 | Loss --> 2.743 | Grad_l2 --> 0.421 | Weights_l2 --> 24968.748 | Lr --> 0.007 | Seconds_per_step --> 2.044 | [2024-07-29 22:32:01,539][Main][INFO] - [train] Step 18000 out of 120000 | Loss --> 2.731 | Grad_l2 --> 0.416 | Weights_l2 --> 25022.671 | Lr --> 0.007 | Seconds_per_step --> 2.023 | [2024-07-29 22:35:20,960][Main][INFO] - [train] Step 18100 out of 120000 | Loss --> 2.738 | Grad_l2 --> 0.433 | Weights_l2 --> 25076.643 | Lr --> 0.007 | Seconds_per_step --> 1.994 | [2024-07-29 22:38:43,795][Main][INFO] - [train] Step 18200 out of 120000 | Loss --> 2.706 | Grad_l2 --> 0.419 | Weights_l2 --> 25129.452 | Lr --> 0.007 | Seconds_per_step --> 2.028 | [2024-07-29 22:42:08,002][Main][INFO] - [train] Step 18300 out of 120000 | Loss --> 2.699 | Grad_l2 --> 0.415 | Weights_l2 --> 25181.856 | Lr --> 0.007 | Seconds_per_step --> 2.042 | [2024-07-29 22:45:26,552][Main][INFO] - [train] Step 18400 out of 120000 | Loss --> 2.699 | Grad_l2 --> 0.420 | Weights_l2 --> 25234.436 | Lr --> 0.007 | Seconds_per_step --> 1.985 | [2024-07-29 22:48:50,060][Main][INFO] - [train] Step 18500 out of 120000 | Loss --> 2.670 | Grad_l2 --> 0.424 | Weights_l2 --> 25286.745 | Lr --> 0.007 | Seconds_per_step --> 2.035 | [2024-07-29 22:52:13,752][Main][INFO] - [train] Step 18600 out of 120000 | Loss --> 2.672 | Grad_l2 --> 0.418 | Weights_l2 --> 25338.852 | Lr --> 0.007 | Seconds_per_step --> 2.037 | [2024-07-29 22:55:33,258][Main][INFO] - [train] Step 18700 out of 120000 | Loss --> 2.696 | Grad_l2 --> 0.667 | Weights_l2 --> 25387.361 | Lr --> 0.007 | Seconds_per_step --> 1.995 | [2024-07-29 22:58:57,166][Main][INFO] - [train] Step 18800 out of 120000 | Loss --> 2.655 | Grad_l2 --> 0.419 | Weights_l2 --> 25438.784 | Lr --> 0.007 | Seconds_per_step --> 2.039 | [2024-07-29 23:02:19,875][Main][INFO] - [train] Step 18900 out of 120000 | Loss --> 2.650 | Grad_l2 --> 0.411 | Weights_l2 --> 25489.281 | Lr --> 0.007 | Seconds_per_step --> 2.027 | [2024-07-29 23:05:41,091][Main][INFO] - [train] Step 19000 out of 120000 | Loss --> 2.648 | Grad_l2 --> 0.417 | Weights_l2 --> 25539.789 | Lr --> 0.007 | Seconds_per_step --> 2.012 | [2024-07-29 23:09:05,636][Main][INFO] - [train] Step 19100 out of 120000 | Loss --> 2.643 | Grad_l2 --> 0.415 | Weights_l2 --> 25590.138 | Lr --> 0.007 | Seconds_per_step --> 2.045 | [2024-07-29 23:12:29,599][Main][INFO] - [train] Step 19200 out of 120000 | Loss --> 2.655 | Grad_l2 --> 0.411 | Weights_l2 --> 25640.960 | Lr --> 0.007 | Seconds_per_step --> 2.040 | [2024-07-29 23:15:50,853][Main][INFO] - [train] Step 19300 out of 120000 | Loss --> 2.629 | Grad_l2 --> 0.417 | Weights_l2 --> 25691.434 | Lr --> 0.007 | Seconds_per_step --> 2.013 | [2024-07-29 23:19:14,436][Main][INFO] - [train] Step 19400 out of 120000 | Loss --> 2.634 | Grad_l2 --> 0.408 | Weights_l2 --> 25741.335 | Lr --> 0.007 | Seconds_per_step --> 2.036 | [2024-07-29 23:22:37,917][Main][INFO] - [train] Step 19500 out of 120000 | Loss --> 2.634 | Grad_l2 --> 0.412 | Weights_l2 --> 25792.101 | Lr --> 0.007 | Seconds_per_step --> 2.035 | [2024-07-29 23:25:58,985][Main][INFO] - [train] Step 19600 out of 120000 | Loss --> 2.632 | Grad_l2 --> 0.427 | Weights_l2 --> 25841.969 | Lr --> 0.007 | Seconds_per_step --> 2.011 | [2024-07-29 23:29:21,556][Main][INFO] - [train] Step 19700 out of 120000 | Loss --> 2.628 | Grad_l2 --> 0.413 | Weights_l2 --> 25892.307 | Lr --> 0.007 | Seconds_per_step --> 2.026 | [2024-07-29 23:32:45,962][Main][INFO] - [train] Step 19800 out of 120000 | Loss --> 2.628 | Grad_l2 --> 0.419 | Weights_l2 --> 25942.839 | Lr --> 0.007 | Seconds_per_step --> 2.044 | [2024-07-29 23:36:07,602][Main][INFO] - [train] Step 19900 out of 120000 | Loss --> 2.615 | Grad_l2 --> 0.414 | Weights_l2 --> 25993.738 | Lr --> 0.007 | Seconds_per_step --> 2.016 | [2024-07-29 23:39:29,557][Main][INFO] - [train] Step 20000 out of 120000 | Loss --> 2.613 | Grad_l2 --> 0.410 | Weights_l2 --> 26044.671 | Lr --> 0.007 | Seconds_per_step --> 2.020 | [2024-07-29 23:50:30,645][Main][INFO] - [eval] Step 20000 out of 120000 | Loss --> 2.721 | Accuracy --> 0.545 | Time --> 661.086 | [2024-07-29 23:50:30,650][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 [2024-07-29 23:50:30,654][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-29 23:50:33,798][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors [2024-07-29 23:50:33,848][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin [2024-07-29 23:50:33,849][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin [2024-07-29 23:50:33,849][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin [2024-07-29 23:50:33,849][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin [2024-07-29 23:50:33,850][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl [2024-07-29 23:53:56,278][Main][INFO] - [train] Step 20100 out of 120000 | Loss --> 2.603 | Grad_l2 --> 0.413 | Weights_l2 --> 26095.099 | Lr --> 0.007 | Seconds_per_step --> 2.056 | [2024-07-29 23:57:19,158][Main][INFO] - [train] Step 20200 out of 120000 | Loss --> 2.618 | Grad_l2 --> 0.409 | Weights_l2 --> 26144.738 | Lr --> 0.007 | Seconds_per_step --> 2.029 | [2024-07-30 00:00:41,938][Main][INFO] - [train] Step 20300 out of 120000 | Loss --> 2.610 | Grad_l2 --> 0.413 | Weights_l2 --> 26194.923 | Lr --> 0.007 | Seconds_per_step --> 2.028 | [2024-07-30 00:04:03,188][Main][INFO] - [train] Step 20400 out of 120000 | Loss --> 2.639 | Grad_l2 --> 0.418 | Weights_l2 --> 26244.370 | Lr --> 0.007 | Seconds_per_step --> 2.012 | [2024-07-30 00:07:25,667][Main][INFO] - [train] Step 20500 out of 120000 | Loss --> 2.638 | Grad_l2 --> 0.405 | Weights_l2 --> 26294.154 | Lr --> 0.007 | Seconds_per_step --> 2.025 | [2024-07-30 00:10:49,581][Main][INFO] - [train] Step 20600 out of 120000 | Loss --> 2.638 | Grad_l2 --> 0.410 | Weights_l2 --> 26344.113 | Lr --> 0.007 | Seconds_per_step --> 2.039 | [2024-07-30 00:14:12,292][Main][INFO] - [train] Step 20700 out of 120000 | Loss --> 2.624 | Grad_l2 --> 0.407 | Weights_l2 --> 26394.011 | Lr --> 0.007 | Seconds_per_step --> 2.027 | [2024-07-30 00:17:33,391][Main][INFO] - [train] Step 20800 out of 120000 | Loss --> 2.631 | Grad_l2 --> 0.410 | Weights_l2 --> 26443.031 | Lr --> 0.007 | Seconds_per_step --> 2.011 | [2024-07-30 00:20:56,536][Main][INFO] - [train] Step 20900 out of 120000 | Loss --> 2.638 | Grad_l2 --> 0.409 | Weights_l2 --> 26492.153 | Lr --> 0.007 | Seconds_per_step --> 2.031 | [2024-07-30 00:24:18,752][Main][INFO] - [train] Step 21000 out of 120000 | Loss --> 2.641 | Grad_l2 --> 0.410 | Weights_l2 --> 26542.252 | Lr --> 0.007 | Seconds_per_step --> 2.022 | [2024-07-30 00:27:41,794][Main][INFO] - [train] Step 21100 out of 120000 | Loss --> 2.644 | Grad_l2 --> 0.407 | Weights_l2 --> 26590.670 | Lr --> 0.007 | Seconds_per_step --> 2.030 | [2024-07-30 00:31:02,854][Main][INFO] - [train] Step 21200 out of 120000 | Loss --> 2.644 | Grad_l2 --> 0.406 | Weights_l2 --> 26639.573 | Lr --> 0.007 | Seconds_per_step --> 2.011 | [2024-07-30 00:34:25,685][Main][INFO] - [train] Step 21300 out of 120000 | Loss --> 2.637 | Grad_l2 --> 0.410 | Weights_l2 --> 26688.243 | Lr --> 0.007 | Seconds_per_step --> 2.028 | [2024-07-30 00:37:49,461][Main][INFO] - [train] Step 21400 out of 120000 | Loss --> 2.655 | Grad_l2 --> 0.497 | Weights_l2 --> 26735.882 | Lr --> 0.007 | Seconds_per_step --> 2.038 | [2024-07-30 00:41:11,865][Main][INFO] - [train] Step 21500 out of 120000 | Loss --> 2.645 | Grad_l2 --> 0.414 | Weights_l2 --> 26783.513 | Lr --> 0.007 | Seconds_per_step --> 2.024 | [2024-07-30 00:44:33,166][Main][INFO] - [train] Step 21600 out of 120000 | Loss --> 2.655 | Grad_l2 --> 0.405 | Weights_l2 --> 26832.349 | Lr --> 0.007 | Seconds_per_step --> 2.013 | [2024-07-30 00:47:53,810][Main][INFO] - [train] Step 21700 out of 120000 | Loss --> 2.640 | Grad_l2 --> 0.403 | Weights_l2 --> 26880.256 | Lr --> 0.007 | Seconds_per_step --> 2.006 | [2024-07-30 00:51:17,756][Main][INFO] - [train] Step 21800 out of 120000 | Loss --> 2.648 | Grad_l2 --> 0.405 | Weights_l2 --> 26928.388 | Lr --> 0.007 | Seconds_per_step --> 2.039 | [2024-07-30 00:54:39,771][Main][INFO] - [train] Step 21900 out of 120000 | Loss --> 2.654 | Grad_l2 --> 0.402 | Weights_l2 --> 26976.117 | Lr --> 0.007 | Seconds_per_step --> 2.020 | [2024-07-30 00:58:01,191][Main][INFO] - [train] Step 22000 out of 120000 | Loss --> 2.638 | Grad_l2 --> 0.409 | Weights_l2 --> 27023.892 | Lr --> 0.007 | Seconds_per_step --> 2.014 | [2024-07-30 01:01:23,912][Main][INFO] - [train] Step 22100 out of 120000 | Loss --> 2.626 | Grad_l2 --> 0.406 | Weights_l2 --> 27071.462 | Lr --> 0.007 | Seconds_per_step --> 2.027 | [2024-07-30 01:04:44,436][Main][INFO] - [train] Step 22200 out of 120000 | Loss --> 2.611 | Grad_l2 --> 0.399 | Weights_l2 --> 27118.456 | Lr --> 0.007 | Seconds_per_step --> 2.005 | [2024-07-30 01:08:06,382][Main][INFO] - [train] Step 22300 out of 120000 | Loss --> 2.614 | Grad_l2 --> 0.403 | Weights_l2 --> 27165.152 | Lr --> 0.007 | Seconds_per_step --> 2.019 | [2024-07-30 01:11:28,453][Main][INFO] - [train] Step 22400 out of 120000 | Loss --> 2.610 | Grad_l2 --> 0.408 | Weights_l2 --> 27211.653 | Lr --> 0.007 | Seconds_per_step --> 2.021 | [2024-07-30 01:14:50,274][Main][INFO] - [train] Step 22500 out of 120000 | Loss --> 2.601 | Grad_l2 --> 0.407 | Weights_l2 --> 27258.754 | Lr --> 0.007 | Seconds_per_step --> 2.018 | [2024-07-30 01:18:12,655][Main][INFO] - [train] Step 22600 out of 120000 | Loss --> 2.608 | Grad_l2 --> 0.403 | Weights_l2 --> 27305.652 | Lr --> 0.007 | Seconds_per_step --> 2.024 | [2024-07-30 01:21:37,005][Main][INFO] - [train] Step 22700 out of 120000 | Loss --> 2.600 | Grad_l2 --> 0.403 | Weights_l2 --> 27352.438 | Lr --> 0.007 | Seconds_per_step --> 2.043 | [2024-07-30 01:24:58,627][Main][INFO] - [train] Step 22800 out of 120000 | Loss --> 2.603 | Grad_l2 --> 0.405 | Weights_l2 --> 27399.527 | Lr --> 0.007 | Seconds_per_step --> 2.016 | [2024-07-30 01:28:20,777][Main][INFO] - [train] Step 22900 out of 120000 | Loss --> 2.598 | Grad_l2 --> 0.410 | Weights_l2 --> 27446.178 | Lr --> 0.007 | Seconds_per_step --> 2.021 | [2024-07-30 01:31:47,654][Main][INFO] - [train] Step 23000 out of 120000 | Loss --> 2.584 | Grad_l2 --> 0.399 | Weights_l2 --> 27492.361 | Lr --> 0.007 | Seconds_per_step --> 2.069 | [2024-07-30 01:35:08,263][Main][INFO] - [train] Step 23100 out of 120000 | Loss --> 2.595 | Grad_l2 --> 0.407 | Weights_l2 --> 27538.617 | Lr --> 0.007 | Seconds_per_step --> 2.006 | [2024-07-30 01:38:31,838][Main][INFO] - [train] Step 23200 out of 120000 | Loss --> 2.574 | Grad_l2 --> 0.396 | Weights_l2 --> 27585.640 | Lr --> 0.007 | Seconds_per_step --> 2.036 | [2024-07-30 01:41:53,370][Main][INFO] - [train] Step 23300 out of 120000 | Loss --> 2.573 | Grad_l2 --> 0.403 | Weights_l2 --> 27631.594 | Lr --> 0.007 | Seconds_per_step --> 2.015 | [2024-07-30 01:45:14,754][Main][INFO] - [train] Step 23400 out of 120000 | Loss --> 2.566 | Grad_l2 --> 0.404 | Weights_l2 --> 27678.023 | Lr --> 0.007 | Seconds_per_step --> 2.014 | [2024-07-30 01:48:38,142][Main][INFO] - [train] Step 23500 out of 120000 | Loss --> 2.542 | Grad_l2 --> 0.400 | Weights_l2 --> 27724.685 | Lr --> 0.007 | Seconds_per_step --> 2.034 | [2024-07-30 01:52:02,254][Main][INFO] - [train] Step 23600 out of 120000 | Loss --> 2.550 | Grad_l2 --> 0.395 | Weights_l2 --> 27770.221 | Lr --> 0.007 | Seconds_per_step --> 2.041 | [2024-07-30 01:55:22,056][Main][INFO] - [train] Step 23700 out of 120000 | Loss --> 2.556 | Grad_l2 --> 0.400 | Weights_l2 --> 27816.158 | Lr --> 0.006 | Seconds_per_step --> 1.998 | [2024-07-30 01:58:45,906][Main][INFO] - [train] Step 23800 out of 120000 | Loss --> 2.549 | Grad_l2 --> 0.403 | Weights_l2 --> 27861.769 | Lr --> 0.006 | Seconds_per_step --> 2.038 | [2024-07-30 02:02:10,160][Main][INFO] - [train] Step 23900 out of 120000 | Loss --> 2.555 | Grad_l2 --> 0.398 | Weights_l2 --> 27907.658 | Lr --> 0.006 | Seconds_per_step --> 2.043 | [2024-07-30 02:05:30,902][Main][INFO] - [train] Step 24000 out of 120000 | Loss --> 2.543 | Grad_l2 --> 0.401 | Weights_l2 --> 27952.995 | Lr --> 0.006 | Seconds_per_step --> 2.007 | [2024-07-30 02:08:54,138][Main][INFO] - [train] Step 24100 out of 120000 | Loss --> 2.537 | Grad_l2 --> 0.395 | Weights_l2 --> 27998.982 | Lr --> 0.006 | Seconds_per_step --> 2.032 | [2024-07-30 02:12:16,675][Main][INFO] - [train] Step 24200 out of 120000 | Loss --> 2.558 | Grad_l2 --> 0.408 | Weights_l2 --> 28044.060 | Lr --> 0.006 | Seconds_per_step --> 2.025 | [2024-07-30 02:15:37,555][Main][INFO] - [train] Step 24300 out of 120000 | Loss --> 2.541 | Grad_l2 --> 0.404 | Weights_l2 --> 28089.164 | Lr --> 0.006 | Seconds_per_step --> 2.009 | [2024-07-30 02:18:59,595][Main][INFO] - [train] Step 24400 out of 120000 | Loss --> 2.553 | Grad_l2 --> 0.400 | Weights_l2 --> 28134.209 | Lr --> 0.006 | Seconds_per_step --> 2.020 | [2024-07-30 02:22:22,653][Main][INFO] - [train] Step 24500 out of 120000 | Loss --> 2.551 | Grad_l2 --> 0.396 | Weights_l2 --> 28179.047 | Lr --> 0.006 | Seconds_per_step --> 2.031 | [2024-07-30 02:25:42,353][Main][INFO] - [train] Step 24600 out of 120000 | Loss --> 2.553 | Grad_l2 --> 0.398 | Weights_l2 --> 28223.755 | Lr --> 0.006 | Seconds_per_step --> 1.997 | [2024-07-30 02:29:07,872][Main][INFO] - [train] Step 24700 out of 120000 | Loss --> 2.532 | Grad_l2 --> 0.397 | Weights_l2 --> 28267.849 | Lr --> 0.006 | Seconds_per_step --> 2.055 | [2024-07-30 02:32:28,184][Main][INFO] - [train] Step 24800 out of 120000 | Loss --> 2.540 | Grad_l2 --> 0.403 | Weights_l2 --> 28312.249 | Lr --> 0.006 | Seconds_per_step --> 2.003 | [2024-07-30 02:35:50,784][Main][INFO] - [train] Step 24900 out of 120000 | Loss --> 2.527 | Grad_l2 --> 0.396 | Weights_l2 --> 28356.711 | Lr --> 0.006 | Seconds_per_step --> 2.026 | [2024-07-30 02:39:13,961][Main][INFO] - [train] Step 25000 out of 120000 | Loss --> 2.524 | Grad_l2 --> 0.398 | Weights_l2 --> 28399.983 | Lr --> 0.006 | Seconds_per_step --> 2.032 | [2024-07-30 02:50:10,641][Main][INFO] - [eval] Step 25000 out of 120000 | Loss --> 2.595 | Accuracy --> 0.557 | Time --> 656.678 | [2024-07-30 02:53:31,641][Main][INFO] - [train] Step 25100 out of 120000 | Loss --> 2.524 | Grad_l2 --> 0.409 | Weights_l2 --> 28444.031 | Lr --> 0.006 | Seconds_per_step --> 2.010 | [2024-07-30 02:56:54,586][Main][INFO] - [train] Step 25200 out of 120000 | Loss --> 2.517 | Grad_l2 --> 0.392 | Weights_l2 --> 28488.178 | Lr --> 0.006 | Seconds_per_step --> 2.029 | [2024-07-30 03:00:16,322][Main][INFO] - [train] Step 25300 out of 120000 | Loss --> 2.525 | Grad_l2 --> 0.393 | Weights_l2 --> 28531.744 | Lr --> 0.006 | Seconds_per_step --> 2.017 | [2024-07-30 03:03:38,294][Main][INFO] - [train] Step 25400 out of 120000 | Loss --> 2.504 | Grad_l2 --> 0.400 | Weights_l2 --> 28574.543 | Lr --> 0.006 | Seconds_per_step --> 2.020 | [2024-07-30 03:07:01,047][Main][INFO] - [train] Step 25500 out of 120000 | Loss --> 2.494 | Grad_l2 --> 0.390 | Weights_l2 --> 28617.761 | Lr --> 0.006 | Seconds_per_step --> 2.028 | [2024-07-30 03:10:21,399][Main][INFO] - [train] Step 25600 out of 120000 | Loss --> 2.494 | Grad_l2 --> 0.399 | Weights_l2 --> 28660.858 | Lr --> 0.006 | Seconds_per_step --> 2.004 | [2024-07-30 03:13:43,729][Main][INFO] - [train] Step 25700 out of 120000 | Loss --> 2.493 | Grad_l2 --> 0.396 | Weights_l2 --> 28703.567 | Lr --> 0.006 | Seconds_per_step --> 2.023 | [2024-07-30 03:17:06,768][Main][INFO] - [train] Step 25800 out of 120000 | Loss --> 2.480 | Grad_l2 --> 0.395 | Weights_l2 --> 28745.835 | Lr --> 0.006 | Seconds_per_step --> 2.030 | [2024-07-30 03:20:27,021][Main][INFO] - [train] Step 25900 out of 120000 | Loss --> 2.481 | Grad_l2 --> 0.392 | Weights_l2 --> 28788.502 | Lr --> 0.006 | Seconds_per_step --> 2.003 | [2024-07-30 03:23:50,790][Main][INFO] - [train] Step 26000 out of 120000 | Loss --> 2.479 | Grad_l2 --> 0.394 | Weights_l2 --> 28831.029 | Lr --> 0.006 | Seconds_per_step --> 2.038 | [2024-07-30 03:27:13,043][Main][INFO] - [train] Step 26100 out of 120000 | Loss --> 2.461 | Grad_l2 --> 0.394 | Weights_l2 --> 28873.383 | Lr --> 0.006 | Seconds_per_step --> 2.023 | [2024-07-30 03:30:34,091][Main][INFO] - [train] Step 26200 out of 120000 | Loss --> 2.465 | Grad_l2 --> 0.396 | Weights_l2 --> 28915.722 | Lr --> 0.006 | Seconds_per_step --> 2.010 | [2024-07-30 03:33:57,987][Main][INFO] - [train] Step 26300 out of 120000 | Loss --> 2.473 | Grad_l2 --> 0.392 | Weights_l2 --> 28957.651 | Lr --> 0.006 | Seconds_per_step --> 2.039 | [2024-07-30 03:37:20,784][Main][INFO] - [train] Step 26400 out of 120000 | Loss --> 2.457 | Grad_l2 --> 0.395 | Weights_l2 --> 28999.991 | Lr --> 0.006 | Seconds_per_step --> 2.028 | [2024-07-30 03:40:43,201][Main][INFO] - [train] Step 26500 out of 120000 | Loss --> 2.457 | Grad_l2 --> 0.391 | Weights_l2 --> 29042.185 | Lr --> 0.006 | Seconds_per_step --> 2.024 | [2024-07-30 03:44:03,853][Main][INFO] - [train] Step 26600 out of 120000 | Loss --> 2.452 | Grad_l2 --> 0.392 | Weights_l2 --> 29085.037 | Lr --> 0.006 | Seconds_per_step --> 2.007 | [2024-07-30 03:47:28,624][Main][INFO] - [train] Step 26700 out of 120000 | Loss --> 2.446 | Grad_l2 --> 0.389 | Weights_l2 --> 29128.349 | Lr --> 0.006 | Seconds_per_step --> 2.048 | [2024-07-30 03:50:49,955][Main][INFO] - [train] Step 26800 out of 120000 | Loss --> 2.458 | Grad_l2 --> 0.390 | Weights_l2 --> 29170.894 | Lr --> 0.006 | Seconds_per_step --> 2.013 | [2024-07-30 03:54:09,643][Main][INFO] - [train] Step 26900 out of 120000 | Loss --> 2.457 | Grad_l2 --> 0.392 | Weights_l2 --> 29213.962 | Lr --> 0.006 | Seconds_per_step --> 1.997 | [2024-07-30 03:57:33,869][Main][INFO] - [train] Step 27000 out of 120000 | Loss --> 2.458 | Grad_l2 --> 0.394 | Weights_l2 --> 29256.203 | Lr --> 0.006 | Seconds_per_step --> 2.042 | [2024-07-30 04:00:55,587][Main][INFO] - [train] Step 27100 out of 120000 | Loss --> 2.451 | Grad_l2 --> 0.399 | Weights_l2 --> 29298.811 | Lr --> 0.006 | Seconds_per_step --> 2.017 | [2024-07-30 04:04:15,936][Main][INFO] - [train] Step 27200 out of 120000 | Loss --> 2.452 | Grad_l2 --> 0.390 | Weights_l2 --> 29341.279 | Lr --> 0.006 | Seconds_per_step --> 2.003 | [2024-07-30 04:07:41,522][Main][INFO] - [train] Step 27300 out of 120000 | Loss --> 2.448 | Grad_l2 --> 0.395 | Weights_l2 --> 29384.006 | Lr --> 0.006 | Seconds_per_step --> 2.056 | [2024-07-30 04:11:02,636][Main][INFO] - [train] Step 27400 out of 120000 | Loss --> 2.437 | Grad_l2 --> 0.387 | Weights_l2 --> 29427.343 | Lr --> 0.006 | Seconds_per_step --> 2.011 | [2024-07-30 04:14:27,636][Main][INFO] - [train] Step 27500 out of 120000 | Loss --> 2.414 | Grad_l2 --> 0.387 | Weights_l2 --> 29470.071 | Lr --> 0.006 | Seconds_per_step --> 2.050 | [2024-07-30 04:17:50,187][Main][INFO] - [train] Step 27600 out of 120000 | Loss --> 2.383 | Grad_l2 --> 0.383 | Weights_l2 --> 29512.241 | Lr --> 0.006 | Seconds_per_step --> 2.026 | [2024-07-30 04:21:12,057][Main][INFO] - [train] Step 27700 out of 120000 | Loss --> 2.392 | Grad_l2 --> 0.388 | Weights_l2 --> 29554.804 | Lr --> 0.006 | Seconds_per_step --> 2.019 | [2024-07-30 04:24:33,021][Main][INFO] - [train] Step 27800 out of 120000 | Loss --> 2.383 | Grad_l2 --> 0.387 | Weights_l2 --> 29597.106 | Lr --> 0.006 | Seconds_per_step --> 2.010 | [2024-07-30 04:27:56,652][Main][INFO] - [train] Step 27900 out of 120000 | Loss --> 2.426 | Grad_l2 --> 0.387 | Weights_l2 --> 29639.336 | Lr --> 0.006 | Seconds_per_step --> 2.036 | [2024-07-30 04:31:18,391][Main][INFO] - [train] Step 28000 out of 120000 | Loss --> 2.425 | Grad_l2 --> 0.387 | Weights_l2 --> 29681.122 | Lr --> 0.006 | Seconds_per_step --> 2.017 | [2024-07-30 04:34:40,492][Main][INFO] - [train] Step 28100 out of 120000 | Loss --> 2.423 | Grad_l2 --> 0.381 | Weights_l2 --> 29723.204 | Lr --> 0.006 | Seconds_per_step --> 2.021 | [2024-07-30 04:38:03,690][Main][INFO] - [train] Step 28200 out of 120000 | Loss --> 2.430 | Grad_l2 --> 0.388 | Weights_l2 --> 29764.105 | Lr --> 0.006 | Seconds_per_step --> 2.032 | [2024-07-30 04:41:25,087][Main][INFO] - [train] Step 28300 out of 120000 | Loss --> 2.441 | Grad_l2 --> 0.393 | Weights_l2 --> 29805.630 | Lr --> 0.006 | Seconds_per_step --> 2.014 | [2024-07-30 04:44:46,955][Main][INFO] - [train] Step 28400 out of 120000 | Loss --> 2.459 | Grad_l2 --> 0.391 | Weights_l2 --> 29847.694 | Lr --> 0.006 | Seconds_per_step --> 2.019 | [2024-07-30 04:48:08,940][Main][INFO] - [train] Step 28500 out of 120000 | Loss --> 2.440 | Grad_l2 --> 0.388 | Weights_l2 --> 29888.807 | Lr --> 0.006 | Seconds_per_step --> 2.020 | [2024-07-30 04:51:31,086][Main][INFO] - [train] Step 28600 out of 120000 | Loss --> 2.452 | Grad_l2 --> 0.392 | Weights_l2 --> 29930.046 | Lr --> 0.006 | Seconds_per_step --> 2.021 | [2024-07-30 04:54:55,094][Main][INFO] - [train] Step 28700 out of 120000 | Loss --> 2.463 | Grad_l2 --> 0.393 | Weights_l2 --> 29971.723 | Lr --> 0.006 | Seconds_per_step --> 2.040 | [2024-07-30 04:58:15,642][Main][INFO] - [train] Step 28800 out of 120000 | Loss --> 2.453 | Grad_l2 --> 0.383 | Weights_l2 --> 30011.959 | Lr --> 0.006 | Seconds_per_step --> 2.005 | [2024-07-30 05:01:42,251][Main][INFO] - [train] Step 28900 out of 120000 | Loss --> 2.457 | Grad_l2 --> 0.388 | Weights_l2 --> 30053.234 | Lr --> 0.006 | Seconds_per_step --> 2.066 | [2024-07-30 05:05:04,556][Main][INFO] - [train] Step 29000 out of 120000 | Loss --> 2.469 | Grad_l2 --> 0.391 | Weights_l2 --> 30094.084 | Lr --> 0.006 | Seconds_per_step --> 2.023 | [2024-07-30 05:08:26,337][Main][INFO] - [train] Step 29100 out of 120000 | Loss --> 2.465 | Grad_l2 --> 0.385 | Weights_l2 --> 30135.324 | Lr --> 0.006 | Seconds_per_step --> 2.018 | [2024-07-30 05:11:50,174][Main][INFO] - [train] Step 29200 out of 120000 | Loss --> 2.463 | Grad_l2 --> 0.389 | Weights_l2 --> 30176.282 | Lr --> 0.006 | Seconds_per_step --> 2.038 | [2024-07-30 05:15:11,938][Main][INFO] - [train] Step 29300 out of 120000 | Loss --> 2.464 | Grad_l2 --> 0.391 | Weights_l2 --> 30216.978 | Lr --> 0.006 | Seconds_per_step --> 2.018 | [2024-07-30 05:18:34,277][Main][INFO] - [train] Step 29400 out of 120000 | Loss --> 2.477 | Grad_l2 --> 0.390 | Weights_l2 --> 30257.575 | Lr --> 0.006 | Seconds_per_step --> 2.023 | [2024-07-30 05:21:57,238][Main][INFO] - [train] Step 29500 out of 120000 | Loss --> 2.479 | Grad_l2 --> 0.386 | Weights_l2 --> 30298.181 | Lr --> 0.006 | Seconds_per_step --> 2.030 | [2024-07-30 05:25:19,569][Main][INFO] - [train] Step 29600 out of 120000 | Loss --> 2.480 | Grad_l2 --> 0.390 | Weights_l2 --> 30339.056 | Lr --> 0.006 | Seconds_per_step --> 2.023 | [2024-07-30 05:28:42,553][Main][INFO] - [train] Step 29700 out of 120000 | Loss --> 2.479 | Grad_l2 --> 0.393 | Weights_l2 --> 30380.201 | Lr --> 0.006 | Seconds_per_step --> 2.030 | [2024-07-30 05:32:04,139][Main][INFO] - [train] Step 29800 out of 120000 | Loss --> 2.454 | Grad_l2 --> 0.387 | Weights_l2 --> 30420.274 | Lr --> 0.006 | Seconds_per_step --> 2.016 | [2024-07-30 05:35:25,054][Main][INFO] - [train] Step 29900 out of 120000 | Loss --> 2.461 | Grad_l2 --> 0.388 | Weights_l2 --> 30461.111 | Lr --> 0.006 | Seconds_per_step --> 2.009 | [2024-07-30 05:38:48,782][Main][INFO] - [train] Step 30000 out of 120000 | Loss --> 2.452 | Grad_l2 --> 0.390 | Weights_l2 --> 30501.093 | Lr --> 0.006 | Seconds_per_step --> 2.037 | [2024-07-30 05:49:46,107][Main][INFO] - [eval] Step 30000 out of 120000 | Loss --> 2.510 | Accuracy --> 0.568 | Time --> 657.322 | [2024-07-30 05:49:46,111][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-30000 [2024-07-30 05:49:46,114][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-30 05:49:49,352][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-30000/model.safetensors [2024-07-30 05:49:49,402][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-30000/optimizer.bin [2024-07-30 05:49:49,404][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-30000/scheduler.bin [2024-07-30 05:49:49,404][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-30000/sampler.bin [2024-07-30 05:49:49,404][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-30000/sampler_1.bin [2024-07-30 05:49:49,405][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-30000/random_states_0.pkl [2024-07-30 05:53:11,991][Main][INFO] - [train] Step 30100 out of 120000 | Loss --> 2.449 | Grad_l2 --> 0.386 | Weights_l2 --> 30540.970 | Lr --> 0.006 | Seconds_per_step --> 2.059 | [2024-07-30 05:56:32,755][Main][INFO] - [train] Step 30200 out of 120000 | Loss --> 2.445 | Grad_l2 --> 0.388 | Weights_l2 --> 30580.884 | Lr --> 0.006 | Seconds_per_step --> 2.008 | [2024-07-30 05:59:54,338][Main][INFO] - [train] Step 30300 out of 120000 | Loss --> 2.448 | Grad_l2 --> 0.385 | Weights_l2 --> 30620.414 | Lr --> 0.006 | Seconds_per_step --> 2.016 | [2024-07-30 06:03:15,992][Main][INFO] - [train] Step 30400 out of 120000 | Loss --> 2.428 | Grad_l2 --> 0.387 | Weights_l2 --> 30659.855 | Lr --> 0.006 | Seconds_per_step --> 2.017 | [2024-07-30 06:06:37,472][Main][INFO] - [train] Step 30500 out of 120000 | Loss --> 2.403 | Grad_l2 --> 0.384 | Weights_l2 --> 30699.381 | Lr --> 0.006 | Seconds_per_step --> 2.015 | [2024-07-30 06:10:01,985][Main][INFO] - [train] Step 30600 out of 120000 | Loss --> 2.406 | Grad_l2 --> 0.388 | Weights_l2 --> 30738.649 | Lr --> 0.006 | Seconds_per_step --> 2.045 | [2024-07-30 06:13:19,993][Main][INFO] - [train] Step 30700 out of 120000 | Loss --> 2.403 | Grad_l2 --> 0.382 | Weights_l2 --> 30777.835 | Lr --> 0.006 | Seconds_per_step --> 1.980 | [2024-07-30 06:16:42,446][Main][INFO] - [train] Step 30800 out of 120000 | Loss --> 2.414 | Grad_l2 --> 0.388 | Weights_l2 --> 30816.717 | Lr --> 0.006 | Seconds_per_step --> 2.025 | [2024-07-30 06:20:05,354][Main][INFO] - [train] Step 30900 out of 120000 | Loss --> 2.428 | Grad_l2 --> 0.388 | Weights_l2 --> 30854.897 | Lr --> 0.006 | Seconds_per_step --> 2.029 | [2024-07-30 06:23:25,836][Main][INFO] - [train] Step 31000 out of 120000 | Loss --> 2.433 | Grad_l2 --> 0.387 | Weights_l2 --> 30894.173 | Lr --> 0.006 | Seconds_per_step --> 2.005 | [2024-07-30 06:26:48,734][Main][INFO] - [train] Step 31100 out of 120000 | Loss --> 2.439 | Grad_l2 --> 0.387 | Weights_l2 --> 30933.154 | Lr --> 0.006 | Seconds_per_step --> 2.029 | [2024-07-30 06:30:10,536][Main][INFO] - [train] Step 31200 out of 120000 | Loss --> 2.439 | Grad_l2 --> 0.386 | Weights_l2 --> 30972.267 | Lr --> 0.006 | Seconds_per_step --> 2.018 | [2024-07-30 06:33:32,278][Main][INFO] - [train] Step 31300 out of 120000 | Loss --> 2.439 | Grad_l2 --> 0.390 | Weights_l2 --> 31011.311 | Lr --> 0.006 | Seconds_per_step --> 2.017 | [2024-07-30 06:36:53,565][Main][INFO] - [train] Step 31400 out of 120000 | Loss --> 2.435 | Grad_l2 --> 0.389 | Weights_l2 --> 31049.979 | Lr --> 0.006 | Seconds_per_step --> 2.013 | [2024-07-30 06:40:16,275][Main][INFO] - [train] Step 31500 out of 120000 | Loss --> 2.436 | Grad_l2 --> 0.387 | Weights_l2 --> 31088.704 | Lr --> 0.006 | Seconds_per_step --> 2.027 | [2024-07-30 06:43:38,634][Main][INFO] - [train] Step 31600 out of 120000 | Loss --> 2.425 | Grad_l2 --> 0.387 | Weights_l2 --> 31127.024 | Lr --> 0.006 | Seconds_per_step --> 2.024 | [2024-07-30 06:46:58,975][Main][INFO] - [train] Step 31700 out of 120000 | Loss --> 2.440 | Grad_l2 --> 0.388 | Weights_l2 --> 31165.615 | Lr --> 0.006 | Seconds_per_step --> 2.003 | [2024-07-30 06:50:23,750][Main][INFO] - [train] Step 31800 out of 120000 | Loss --> 2.436 | Grad_l2 --> 0.386 | Weights_l2 --> 31203.971 | Lr --> 0.006 | Seconds_per_step --> 2.048 | [2024-07-30 06:53:46,438][Main][INFO] - [train] Step 31900 out of 120000 | Loss --> 2.439 | Grad_l2 --> 0.388 | Weights_l2 --> 31242.344 | Lr --> 0.006 | Seconds_per_step --> 2.027 | [2024-07-30 06:57:07,020][Main][INFO] - [train] Step 32000 out of 120000 | Loss --> 2.431 | Grad_l2 --> 0.388 | Weights_l2 --> 31281.020 | Lr --> 0.006 | Seconds_per_step --> 2.006 | [2024-07-30 07:00:30,539][Main][INFO] - [train] Step 32100 out of 120000 | Loss --> 2.442 | Grad_l2 --> 0.391 | Weights_l2 --> 31319.305 | Lr --> 0.006 | Seconds_per_step --> 2.035 | [2024-07-30 07:03:52,577][Main][INFO] - [train] Step 32200 out of 120000 | Loss --> 2.440 | Grad_l2 --> 0.388 | Weights_l2 --> 31358.172 | Lr --> 0.006 | Seconds_per_step --> 2.020 | [2024-07-30 07:07:15,095][Main][INFO] - [train] Step 32300 out of 120000 | Loss --> 2.424 | Grad_l2 --> 0.387 | Weights_l2 --> 31396.307 | Lr --> 0.006 | Seconds_per_step --> 2.025 | [2024-07-30 07:10:38,169][Main][INFO] - [train] Step 32400 out of 120000 | Loss --> 2.422 | Grad_l2 --> 0.385 | Weights_l2 --> 31434.415 | Lr --> 0.006 | Seconds_per_step --> 2.031 | [2024-07-30 07:13:59,954][Main][INFO] - [train] Step 32500 out of 120000 | Loss --> 2.422 | Grad_l2 --> 0.387 | Weights_l2 --> 31473.239 | Lr --> 0.006 | Seconds_per_step --> 2.018 | [2024-07-30 07:17:23,795][Main][INFO] - [train] Step 32600 out of 120000 | Loss --> 2.410 | Grad_l2 --> 0.381 | Weights_l2 --> 31510.707 | Lr --> 0.006 | Seconds_per_step --> 2.038 | [2024-07-30 07:20:47,101][Main][INFO] - [train] Step 32700 out of 120000 | Loss --> 2.395 | Grad_l2 --> 0.390 | Weights_l2 --> 31548.498 | Lr --> 0.006 | Seconds_per_step --> 2.033 | [2024-07-30 07:24:08,958][Main][INFO] - [train] Step 32800 out of 120000 | Loss --> 2.391 | Grad_l2 --> 0.385 | Weights_l2 --> 31585.865 | Lr --> 0.006 | Seconds_per_step --> 2.019 | [2024-07-30 07:27:30,386][Main][INFO] - [train] Step 32900 out of 120000 | Loss --> 2.383 | Grad_l2 --> 0.383 | Weights_l2 --> 31622.953 | Lr --> 0.006 | Seconds_per_step --> 2.014 | [2024-07-30 07:30:54,834][Main][INFO] - [train] Step 33000 out of 120000 | Loss --> 2.393 | Grad_l2 --> 0.384 | Weights_l2 --> 31660.053 | Lr --> 0.006 | Seconds_per_step --> 2.044 | [2024-07-30 07:34:16,798][Main][INFO] - [train] Step 33100 out of 120000 | Loss --> 2.388 | Grad_l2 --> 0.386 | Weights_l2 --> 31697.155 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 07:37:38,337][Main][INFO] - [train] Step 33200 out of 120000 | Loss --> 2.389 | Grad_l2 --> 0.381 | Weights_l2 --> 31734.007 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 07:41:01,985][Main][INFO] - [train] Step 33300 out of 120000 | Loss --> 2.379 | Grad_l2 --> 0.384 | Weights_l2 --> 31771.054 | Lr --> 0.005 | Seconds_per_step --> 2.036 | [2024-07-30 07:44:24,799][Main][INFO] - [train] Step 33400 out of 120000 | Loss --> 2.367 | Grad_l2 --> 0.380 | Weights_l2 --> 31807.896 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 07:47:46,885][Main][INFO] - [train] Step 33500 out of 120000 | Loss --> 2.361 | Grad_l2 --> 0.383 | Weights_l2 --> 31844.993 | Lr --> 0.005 | Seconds_per_step --> 2.021 | [2024-07-30 07:51:07,542][Main][INFO] - [train] Step 33600 out of 120000 | Loss --> 2.365 | Grad_l2 --> 0.377 | Weights_l2 --> 31882.057 | Lr --> 0.005 | Seconds_per_step --> 2.007 | [2024-07-30 07:54:30,257][Main][INFO] - [train] Step 33700 out of 120000 | Loss --> 2.382 | Grad_l2 --> 0.378 | Weights_l2 --> 31919.148 | Lr --> 0.005 | Seconds_per_step --> 2.027 | [2024-07-30 07:57:50,810][Main][INFO] - [train] Step 33800 out of 120000 | Loss --> 2.380 | Grad_l2 --> 0.377 | Weights_l2 --> 31956.295 | Lr --> 0.005 | Seconds_per_step --> 2.006 | [2024-07-30 08:01:15,184][Main][INFO] - [train] Step 33900 out of 120000 | Loss --> 2.375 | Grad_l2 --> 0.385 | Weights_l2 --> 31993.877 | Lr --> 0.005 | Seconds_per_step --> 2.044 | [2024-07-30 08:04:36,874][Main][INFO] - [train] Step 34000 out of 120000 | Loss --> 2.405 | Grad_l2 --> 0.380 | Weights_l2 --> 32030.827 | Lr --> 0.005 | Seconds_per_step --> 2.017 | [2024-07-30 08:07:58,456][Main][INFO] - [train] Step 34100 out of 120000 | Loss --> 2.409 | Grad_l2 --> 0.387 | Weights_l2 --> 32068.307 | Lr --> 0.005 | Seconds_per_step --> 2.016 | [2024-07-30 08:11:22,140][Main][INFO] - [train] Step 34200 out of 120000 | Loss --> 2.417 | Grad_l2 --> 0.385 | Weights_l2 --> 32105.378 | Lr --> 0.005 | Seconds_per_step --> 2.037 | [2024-07-30 08:14:43,956][Main][INFO] - [train] Step 34300 out of 120000 | Loss --> 2.433 | Grad_l2 --> 0.392 | Weights_l2 --> 32143.145 | Lr --> 0.005 | Seconds_per_step --> 2.018 | [2024-07-30 08:18:04,773][Main][INFO] - [train] Step 34400 out of 120000 | Loss --> 2.421 | Grad_l2 --> 0.384 | Weights_l2 --> 32180.473 | Lr --> 0.005 | Seconds_per_step --> 2.008 | [2024-07-30 08:21:29,591][Main][INFO] - [train] Step 34500 out of 120000 | Loss --> 2.434 | Grad_l2 --> 0.390 | Weights_l2 --> 32217.779 | Lr --> 0.005 | Seconds_per_step --> 2.048 | [2024-07-30 08:24:51,754][Main][INFO] - [train] Step 34600 out of 120000 | Loss --> 2.420 | Grad_l2 --> 0.387 | Weights_l2 --> 32255.406 | Lr --> 0.005 | Seconds_per_step --> 2.022 | [2024-07-30 08:28:13,852][Main][INFO] - [train] Step 34700 out of 120000 | Loss --> 2.438 | Grad_l2 --> 0.390 | Weights_l2 --> 32293.225 | Lr --> 0.005 | Seconds_per_step --> 2.021 | [2024-07-30 08:31:38,992][Main][INFO] - [train] Step 34800 out of 120000 | Loss --> 2.442 | Grad_l2 --> 0.380 | Weights_l2 --> 32331.286 | Lr --> 0.005 | Seconds_per_step --> 2.051 | [2024-07-30 08:35:01,181][Main][INFO] - [train] Step 34900 out of 120000 | Loss --> 2.453 | Grad_l2 --> 0.388 | Weights_l2 --> 32369.313 | Lr --> 0.005 | Seconds_per_step --> 2.022 | [2024-07-30 08:38:22,386][Main][INFO] - [train] Step 35000 out of 120000 | Loss --> 2.450 | Grad_l2 --> 0.388 | Weights_l2 --> 32407.344 | Lr --> 0.005 | Seconds_per_step --> 2.012 | [2024-07-30 08:49:17,617][Main][INFO] - [eval] Step 35000 out of 120000 | Loss --> 2.439 | Accuracy --> 0.576 | Time --> 655.221 | [2024-07-30 08:52:40,638][Main][INFO] - [train] Step 35100 out of 120000 | Loss --> 2.456 | Grad_l2 --> 0.392 | Weights_l2 --> 32445.338 | Lr --> 0.005 | Seconds_per_step --> 2.030 | [2024-07-30 08:56:02,651][Main][INFO] - [train] Step 35200 out of 120000 | Loss --> 2.475 | Grad_l2 --> 0.389 | Weights_l2 --> 32483.696 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 08:59:25,142][Main][INFO] - [train] Step 35300 out of 120000 | Loss --> 2.463 | Grad_l2 --> 0.389 | Weights_l2 --> 32521.588 | Lr --> 0.005 | Seconds_per_step --> 2.025 | [2024-07-30 09:02:47,570][Main][INFO] - [train] Step 35400 out of 120000 | Loss --> 2.490 | Grad_l2 --> 0.536 | Weights_l2 --> 32558.851 | Lr --> 0.005 | Seconds_per_step --> 2.024 | [2024-07-30 09:06:10,855][Main][INFO] - [train] Step 35500 out of 120000 | Loss --> 2.492 | Grad_l2 --> 0.441 | Weights_l2 --> 32596.255 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 09:09:31,254][Main][INFO] - [train] Step 35600 out of 120000 | Loss --> 2.484 | Grad_l2 --> 0.386 | Weights_l2 --> 32633.765 | Lr --> 0.005 | Seconds_per_step --> 2.004 | [2024-07-30 09:12:55,154][Main][INFO] - [train] Step 35700 out of 120000 | Loss --> 2.493 | Grad_l2 --> 0.397 | Weights_l2 --> 32671.320 | Lr --> 0.005 | Seconds_per_step --> 2.039 | [2024-07-30 09:16:17,454][Main][INFO] - [train] Step 35800 out of 120000 | Loss --> 2.503 | Grad_l2 --> 0.394 | Weights_l2 --> 32708.836 | Lr --> 0.005 | Seconds_per_step --> 2.023 | [2024-07-30 09:19:43,543][Main][INFO] - [train] Step 35900 out of 120000 | Loss --> 2.513 | Grad_l2 --> 0.385 | Weights_l2 --> 32746.345 | Lr --> 0.005 | Seconds_per_step --> 2.061 | [2024-07-30 09:23:04,354][Main][INFO] - [train] Step 36000 out of 120000 | Loss --> 2.518 | Grad_l2 --> 0.394 | Weights_l2 --> 32783.588 | Lr --> 0.005 | Seconds_per_step --> 2.008 | [2024-07-30 09:26:26,739][Main][INFO] - [train] Step 36100 out of 120000 | Loss --> 2.515 | Grad_l2 --> 0.393 | Weights_l2 --> 32820.440 | Lr --> 0.005 | Seconds_per_step --> 2.024 | [2024-07-30 09:29:48,390][Main][INFO] - [train] Step 36200 out of 120000 | Loss --> 2.518 | Grad_l2 --> 0.385 | Weights_l2 --> 32857.410 | Lr --> 0.005 | Seconds_per_step --> 2.017 | [2024-07-30 09:33:11,154][Main][INFO] - [train] Step 36300 out of 120000 | Loss --> 2.505 | Grad_l2 --> 0.382 | Weights_l2 --> 32893.923 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 09:36:34,003][Main][INFO] - [train] Step 36400 out of 120000 | Loss --> 2.489 | Grad_l2 --> 0.387 | Weights_l2 --> 32930.363 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 09:39:56,354][Main][INFO] - [train] Step 36500 out of 120000 | Loss --> 2.465 | Grad_l2 --> 0.384 | Weights_l2 --> 32966.222 | Lr --> 0.005 | Seconds_per_step --> 2.024 | [2024-07-30 09:43:17,963][Main][INFO] - [train] Step 36600 out of 120000 | Loss --> 2.454 | Grad_l2 --> 0.389 | Weights_l2 --> 33002.024 | Lr --> 0.005 | Seconds_per_step --> 2.016 | [2024-07-30 09:46:40,073][Main][INFO] - [train] Step 36700 out of 120000 | Loss --> 2.445 | Grad_l2 --> 0.380 | Weights_l2 --> 33037.852 | Lr --> 0.005 | Seconds_per_step --> 2.021 | [2024-07-30 09:50:04,170][Main][INFO] - [train] Step 36800 out of 120000 | Loss --> 2.431 | Grad_l2 --> 0.384 | Weights_l2 --> 33073.845 | Lr --> 0.005 | Seconds_per_step --> 2.041 | [2024-07-30 09:53:24,440][Main][INFO] - [train] Step 36900 out of 120000 | Loss --> 2.440 | Grad_l2 --> 0.386 | Weights_l2 --> 33109.220 | Lr --> 0.005 | Seconds_per_step --> 2.003 | [2024-07-30 09:56:47,001][Main][INFO] - [train] Step 37000 out of 120000 | Loss --> 2.429 | Grad_l2 --> 0.383 | Weights_l2 --> 33144.222 | Lr --> 0.005 | Seconds_per_step --> 2.025 | [2024-07-30 10:00:09,282][Main][INFO] - [train] Step 37100 out of 120000 | Loss --> 2.409 | Grad_l2 --> 0.382 | Weights_l2 --> 33178.897 | Lr --> 0.005 | Seconds_per_step --> 2.023 | [2024-07-30 10:03:29,370][Main][INFO] - [train] Step 37200 out of 120000 | Loss --> 2.411 | Grad_l2 --> 0.385 | Weights_l2 --> 33213.299 | Lr --> 0.005 | Seconds_per_step --> 2.001 | [2024-07-30 10:06:52,795][Main][INFO] - [train] Step 37300 out of 120000 | Loss --> 2.382 | Grad_l2 --> 0.381 | Weights_l2 --> 33247.595 | Lr --> 0.005 | Seconds_per_step --> 2.034 | [2024-07-30 10:10:11,806][Main][INFO] - [train] Step 37400 out of 120000 | Loss --> 2.388 | Grad_l2 --> 0.382 | Weights_l2 --> 33281.931 | Lr --> 0.005 | Seconds_per_step --> 1.990 | [2024-07-30 10:13:34,940][Main][INFO] - [train] Step 37500 out of 120000 | Loss --> 2.366 | Grad_l2 --> 0.379 | Weights_l2 --> 33315.961 | Lr --> 0.005 | Seconds_per_step --> 2.031 | [2024-07-30 10:16:57,038][Main][INFO] - [train] Step 37600 out of 120000 | Loss --> 2.349 | Grad_l2 --> 0.379 | Weights_l2 --> 33349.587 | Lr --> 0.005 | Seconds_per_step --> 2.021 | [2024-07-30 10:20:15,754][Main][INFO] - [train] Step 37700 out of 120000 | Loss --> 2.327 | Grad_l2 --> 0.380 | Weights_l2 --> 33382.662 | Lr --> 0.005 | Seconds_per_step --> 1.987 | [2024-07-30 10:23:38,600][Main][INFO] - [train] Step 37800 out of 120000 | Loss --> 2.321 | Grad_l2 --> 0.373 | Weights_l2 --> 33416.179 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 10:27:01,859][Main][INFO] - [train] Step 37900 out of 120000 | Loss --> 2.292 | Grad_l2 --> 0.375 | Weights_l2 --> 33449.336 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 10:30:21,536][Main][INFO] - [train] Step 38000 out of 120000 | Loss --> 2.291 | Grad_l2 --> 0.377 | Weights_l2 --> 33482.220 | Lr --> 0.005 | Seconds_per_step --> 1.997 | [2024-07-30 10:33:43,002][Main][INFO] - [train] Step 38100 out of 120000 | Loss --> 2.289 | Grad_l2 --> 0.373 | Weights_l2 --> 33514.977 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 10:37:05,835][Main][INFO] - [train] Step 38200 out of 120000 | Loss --> 2.274 | Grad_l2 --> 0.381 | Weights_l2 --> 33547.981 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 10:40:24,670][Main][INFO] - [train] Step 38300 out of 120000 | Loss --> 2.285 | Grad_l2 --> 0.374 | Weights_l2 --> 33580.350 | Lr --> 0.005 | Seconds_per_step --> 1.988 | [2024-07-30 10:43:47,981][Main][INFO] - [train] Step 38400 out of 120000 | Loss --> 2.290 | Grad_l2 --> 0.376 | Weights_l2 --> 33612.685 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 10:47:08,696][Main][INFO] - [train] Step 38500 out of 120000 | Loss --> 2.282 | Grad_l2 --> 0.379 | Weights_l2 --> 33644.820 | Lr --> 0.005 | Seconds_per_step --> 2.007 | [2024-07-30 10:50:29,864][Main][INFO] - [train] Step 38600 out of 120000 | Loss --> 2.279 | Grad_l2 --> 0.377 | Weights_l2 --> 33676.690 | Lr --> 0.005 | Seconds_per_step --> 2.012 | [2024-07-30 10:53:52,746][Main][INFO] - [train] Step 38700 out of 120000 | Loss --> 2.275 | Grad_l2 --> 0.375 | Weights_l2 --> 33709.101 | Lr --> 0.005 | Seconds_per_step --> 2.029 | [2024-07-30 10:57:13,745][Main][INFO] - [train] Step 38800 out of 120000 | Loss --> 2.293 | Grad_l2 --> 0.378 | Weights_l2 --> 33741.451 | Lr --> 0.005 | Seconds_per_step --> 2.010 | [2024-07-30 11:00:35,791][Main][INFO] - [train] Step 38900 out of 120000 | Loss --> 2.277 | Grad_l2 --> 0.374 | Weights_l2 --> 33773.534 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 11:03:58,586][Main][INFO] - [train] Step 39000 out of 120000 | Loss --> 2.283 | Grad_l2 --> 0.382 | Weights_l2 --> 33805.770 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 11:07:18,569][Main][INFO] - [train] Step 39100 out of 120000 | Loss --> 2.283 | Grad_l2 --> 0.373 | Weights_l2 --> 33837.753 | Lr --> 0.005 | Seconds_per_step --> 2.000 | [2024-07-30 11:10:38,682][Main][INFO] - [train] Step 39200 out of 120000 | Loss --> 2.279 | Grad_l2 --> 0.371 | Weights_l2 --> 33869.184 | Lr --> 0.005 | Seconds_per_step --> 2.001 | [2024-07-30 11:14:04,851][Main][INFO] - [train] Step 39300 out of 120000 | Loss --> 2.268 | Grad_l2 --> 0.378 | Weights_l2 --> 33901.776 | Lr --> 0.005 | Seconds_per_step --> 2.062 | [2024-07-30 11:17:25,393][Main][INFO] - [train] Step 39400 out of 120000 | Loss --> 2.282 | Grad_l2 --> 0.375 | Weights_l2 --> 33934.811 | Lr --> 0.005 | Seconds_per_step --> 2.005 | [2024-07-30 11:20:47,999][Main][INFO] - [train] Step 39500 out of 120000 | Loss --> 2.277 | Grad_l2 --> 0.368 | Weights_l2 --> 33967.043 | Lr --> 0.005 | Seconds_per_step --> 2.026 | [2024-07-30 11:24:08,936][Main][INFO] - [train] Step 39600 out of 120000 | Loss --> 2.269 | Grad_l2 --> 0.376 | Weights_l2 --> 34000.213 | Lr --> 0.005 | Seconds_per_step --> 2.009 | [2024-07-30 11:27:28,635][Main][INFO] - [train] Step 39700 out of 120000 | Loss --> 2.281 | Grad_l2 --> 0.371 | Weights_l2 --> 34033.387 | Lr --> 0.005 | Seconds_per_step --> 1.997 | [2024-07-30 11:30:54,737][Main][INFO] - [train] Step 39800 out of 120000 | Loss --> 2.289 | Grad_l2 --> 0.379 | Weights_l2 --> 34066.602 | Lr --> 0.005 | Seconds_per_step --> 2.061 | [2024-07-30 11:34:16,266][Main][INFO] - [train] Step 39900 out of 120000 | Loss --> 2.297 | Grad_l2 --> 0.381 | Weights_l2 --> 34100.242 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 11:37:37,355][Main][INFO] - [train] Step 40000 out of 120000 | Loss --> 2.306 | Grad_l2 --> 0.381 | Weights_l2 --> 34133.556 | Lr --> 0.005 | Seconds_per_step --> 2.011 | [2024-07-30 11:48:41,851][Main][INFO] - [eval] Step 40000 out of 120000 | Loss --> 2.383 | Accuracy --> 0.582 | Time --> 664.494 | [2024-07-30 11:48:41,856][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-40000 [2024-07-30 11:48:41,860][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-30 11:48:44,998][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-40000/model.safetensors [2024-07-30 11:48:45,048][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-40000/optimizer.bin [2024-07-30 11:48:45,049][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-40000/scheduler.bin [2024-07-30 11:48:45,050][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-40000/sampler.bin [2024-07-30 11:48:45,050][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-40000/sampler_1.bin [2024-07-30 11:48:45,051][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-40000/random_states_0.pkl [2024-07-30 11:52:07,535][Main][INFO] - [train] Step 40100 out of 120000 | Loss --> 2.321 | Grad_l2 --> 0.372 | Weights_l2 --> 34167.020 | Lr --> 0.005 | Seconds_per_step --> 2.057 | [2024-07-30 11:55:27,286][Main][INFO] - [train] Step 40200 out of 120000 | Loss --> 2.333 | Grad_l2 --> 0.380 | Weights_l2 --> 34200.694 | Lr --> 0.005 | Seconds_per_step --> 1.998 | [2024-07-30 11:58:55,354][Main][INFO] - [train] Step 40300 out of 120000 | Loss --> 2.321 | Grad_l2 --> 0.377 | Weights_l2 --> 34234.409 | Lr --> 0.005 | Seconds_per_step --> 2.081 | [2024-07-30 12:02:19,963][Main][INFO] - [train] Step 40400 out of 120000 | Loss --> 2.297 | Grad_l2 --> 0.374 | Weights_l2 --> 34268.065 | Lr --> 0.005 | Seconds_per_step --> 2.046 | [2024-07-30 12:05:40,581][Main][INFO] - [train] Step 40500 out of 120000 | Loss --> 2.275 | Grad_l2 --> 0.374 | Weights_l2 --> 34301.928 | Lr --> 0.005 | Seconds_per_step --> 2.006 | [2024-07-30 12:09:01,457][Main][INFO] - [train] Step 40600 out of 120000 | Loss --> 2.236 | Grad_l2 --> 0.371 | Weights_l2 --> 34335.456 | Lr --> 0.005 | Seconds_per_step --> 2.009 | [2024-07-30 12:12:26,005][Main][INFO] - [train] Step 40700 out of 120000 | Loss --> 2.233 | Grad_l2 --> 0.378 | Weights_l2 --> 34368.548 | Lr --> 0.005 | Seconds_per_step --> 2.045 | [2024-07-30 12:15:49,985][Main][INFO] - [train] Step 40800 out of 120000 | Loss --> 2.248 | Grad_l2 --> 0.373 | Weights_l2 --> 34402.050 | Lr --> 0.005 | Seconds_per_step --> 2.040 | [2024-07-30 12:19:13,681][Main][INFO] - [train] Step 40900 out of 120000 | Loss --> 2.253 | Grad_l2 --> 0.372 | Weights_l2 --> 34435.691 | Lr --> 0.005 | Seconds_per_step --> 2.037 | [2024-07-30 12:22:36,912][Main][INFO] - [train] Step 41000 out of 120000 | Loss --> 2.259 | Grad_l2 --> 0.375 | Weights_l2 --> 34469.405 | Lr --> 0.005 | Seconds_per_step --> 2.032 | [2024-07-30 12:25:59,057][Main][INFO] - [train] Step 41100 out of 120000 | Loss --> 2.271 | Grad_l2 --> 0.377 | Weights_l2 --> 34503.645 | Lr --> 0.005 | Seconds_per_step --> 2.021 | [2024-07-30 12:29:23,027][Main][INFO] - [train] Step 41200 out of 120000 | Loss --> 2.264 | Grad_l2 --> 0.372 | Weights_l2 --> 34537.748 | Lr --> 0.005 | Seconds_per_step --> 2.040 | [2024-07-30 12:32:49,892][Main][INFO] - [train] Step 41300 out of 120000 | Loss --> 2.290 | Grad_l2 --> 0.377 | Weights_l2 --> 34571.651 | Lr --> 0.005 | Seconds_per_step --> 2.069 | [2024-07-30 12:36:10,324][Main][INFO] - [train] Step 41400 out of 120000 | Loss --> 2.295 | Grad_l2 --> 0.374 | Weights_l2 --> 34605.580 | Lr --> 0.005 | Seconds_per_step --> 2.004 | [2024-07-30 12:39:34,435][Main][INFO] - [train] Step 41500 out of 120000 | Loss --> 2.293 | Grad_l2 --> 0.377 | Weights_l2 --> 34639.185 | Lr --> 0.005 | Seconds_per_step --> 2.041 | [2024-07-30 12:42:58,969][Main][INFO] - [train] Step 41600 out of 120000 | Loss --> 2.286 | Grad_l2 --> 0.374 | Weights_l2 --> 34672.531 | Lr --> 0.005 | Seconds_per_step --> 2.045 | [2024-07-30 12:46:19,348][Main][INFO] - [train] Step 41700 out of 120000 | Loss --> 2.271 | Grad_l2 --> 0.377 | Weights_l2 --> 34705.893 | Lr --> 0.005 | Seconds_per_step --> 2.004 | [2024-07-30 12:49:45,023][Main][INFO] - [train] Step 41800 out of 120000 | Loss --> 2.279 | Grad_l2 --> 0.377 | Weights_l2 --> 34739.403 | Lr --> 0.005 | Seconds_per_step --> 2.057 | [2024-07-30 12:53:07,005][Main][INFO] - [train] Step 41900 out of 120000 | Loss --> 2.259 | Grad_l2 --> 0.372 | Weights_l2 --> 34772.598 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 12:56:29,736][Main][INFO] - [train] Step 42000 out of 120000 | Loss --> 2.258 | Grad_l2 --> 0.378 | Weights_l2 --> 34805.406 | Lr --> 0.005 | Seconds_per_step --> 2.027 | [2024-07-30 12:59:53,186][Main][INFO] - [train] Step 42100 out of 120000 | Loss --> 2.251 | Grad_l2 --> 0.374 | Weights_l2 --> 34838.056 | Lr --> 0.005 | Seconds_per_step --> 2.034 | [2024-07-30 13:03:14,692][Main][INFO] - [train] Step 42200 out of 120000 | Loss --> 2.231 | Grad_l2 --> 0.375 | Weights_l2 --> 34870.734 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 13:06:36,656][Main][INFO] - [train] Step 42300 out of 120000 | Loss --> 2.239 | Grad_l2 --> 0.368 | Weights_l2 --> 34902.760 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 13:09:59,437][Main][INFO] - [train] Step 42400 out of 120000 | Loss --> 2.236 | Grad_l2 --> 0.369 | Weights_l2 --> 34935.448 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 13:13:21,326][Main][INFO] - [train] Step 42500 out of 120000 | Loss --> 2.231 | Grad_l2 --> 0.375 | Weights_l2 --> 34967.968 | Lr --> 0.005 | Seconds_per_step --> 2.019 | [2024-07-30 13:16:43,154][Main][INFO] - [train] Step 42600 out of 120000 | Loss --> 2.225 | Grad_l2 --> 0.371 | Weights_l2 --> 34999.113 | Lr --> 0.005 | Seconds_per_step --> 2.018 | [2024-07-30 13:20:06,198][Main][INFO] - [train] Step 42700 out of 120000 | Loss --> 2.238 | Grad_l2 --> 0.374 | Weights_l2 --> 35030.350 | Lr --> 0.005 | Seconds_per_step --> 2.030 | [2024-07-30 13:23:28,890][Main][INFO] - [train] Step 42800 out of 120000 | Loss --> 2.229 | Grad_l2 --> 0.363 | Weights_l2 --> 35062.066 | Lr --> 0.005 | Seconds_per_step --> 2.027 | [2024-07-30 13:26:49,974][Main][INFO] - [train] Step 42900 out of 120000 | Loss --> 2.225 | Grad_l2 --> 0.374 | Weights_l2 --> 35093.652 | Lr --> 0.005 | Seconds_per_step --> 2.011 | [2024-07-30 13:30:13,152][Main][INFO] - [train] Step 43000 out of 120000 | Loss --> 2.228 | Grad_l2 --> 0.370 | Weights_l2 --> 35125.320 | Lr --> 0.005 | Seconds_per_step --> 2.032 | [2024-07-30 13:33:34,367][Main][INFO] - [train] Step 43100 out of 120000 | Loss --> 2.221 | Grad_l2 --> 0.371 | Weights_l2 --> 35156.980 | Lr --> 0.005 | Seconds_per_step --> 2.012 | [2024-07-30 13:36:54,853][Main][INFO] - [train] Step 43200 out of 120000 | Loss --> 2.222 | Grad_l2 --> 0.370 | Weights_l2 --> 35188.450 | Lr --> 0.005 | Seconds_per_step --> 2.005 | [2024-07-30 13:40:16,545][Main][INFO] - [train] Step 43300 out of 120000 | Loss --> 2.223 | Grad_l2 --> 0.373 | Weights_l2 --> 35219.873 | Lr --> 0.005 | Seconds_per_step --> 2.017 | [2024-07-30 13:43:40,238][Main][INFO] - [train] Step 43400 out of 120000 | Loss --> 2.216 | Grad_l2 --> 0.369 | Weights_l2 --> 35251.220 | Lr --> 0.005 | Seconds_per_step --> 2.037 | [2024-07-30 13:46:59,871][Main][INFO] - [train] Step 43500 out of 120000 | Loss --> 2.218 | Grad_l2 --> 0.373 | Weights_l2 --> 35282.099 | Lr --> 0.005 | Seconds_per_step --> 1.996 | [2024-07-30 13:50:22,888][Main][INFO] - [train] Step 43600 out of 120000 | Loss --> 2.227 | Grad_l2 --> 0.371 | Weights_l2 --> 35313.358 | Lr --> 0.005 | Seconds_per_step --> 2.030 | [2024-07-30 13:53:45,791][Main][INFO] - [train] Step 43700 out of 120000 | Loss --> 2.224 | Grad_l2 --> 0.368 | Weights_l2 --> 35344.243 | Lr --> 0.005 | Seconds_per_step --> 2.029 | [2024-07-30 13:57:04,523][Main][INFO] - [train] Step 43800 out of 120000 | Loss --> 2.224 | Grad_l2 --> 0.374 | Weights_l2 --> 35375.317 | Lr --> 0.005 | Seconds_per_step --> 1.987 | [2024-07-30 14:00:27,848][Main][INFO] - [train] Step 43900 out of 120000 | Loss --> 2.234 | Grad_l2 --> 0.372 | Weights_l2 --> 35406.030 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 14:03:52,972][Main][INFO] - [train] Step 44000 out of 120000 | Loss --> 2.222 | Grad_l2 --> 0.366 | Weights_l2 --> 35437.504 | Lr --> 0.005 | Seconds_per_step --> 2.051 | [2024-07-30 14:07:14,372][Main][INFO] - [train] Step 44100 out of 120000 | Loss --> 2.229 | Grad_l2 --> 0.374 | Weights_l2 --> 35468.883 | Lr --> 0.005 | Seconds_per_step --> 2.014 | [2024-07-30 14:10:35,547][Main][INFO] - [train] Step 44200 out of 120000 | Loss --> 2.235 | Grad_l2 --> 0.374 | Weights_l2 --> 35500.142 | Lr --> 0.005 | Seconds_per_step --> 2.012 | [2024-07-30 14:13:59,545][Main][INFO] - [train] Step 44300 out of 120000 | Loss --> 2.239 | Grad_l2 --> 0.368 | Weights_l2 --> 35531.287 | Lr --> 0.005 | Seconds_per_step --> 2.040 | [2024-07-30 14:17:21,446][Main][INFO] - [train] Step 44400 out of 120000 | Loss --> 2.233 | Grad_l2 --> 0.370 | Weights_l2 --> 35562.336 | Lr --> 0.005 | Seconds_per_step --> 2.019 | [2024-07-30 14:20:44,972][Main][INFO] - [train] Step 44500 out of 120000 | Loss --> 2.230 | Grad_l2 --> 0.369 | Weights_l2 --> 35593.493 | Lr --> 0.005 | Seconds_per_step --> 2.035 | [2024-07-30 14:24:06,552][Main][INFO] - [train] Step 44600 out of 120000 | Loss --> 2.262 | Grad_l2 --> 0.377 | Weights_l2 --> 35624.732 | Lr --> 0.005 | Seconds_per_step --> 2.016 | [2024-07-30 14:27:29,670][Main][INFO] - [train] Step 44700 out of 120000 | Loss --> 2.272 | Grad_l2 --> 0.377 | Weights_l2 --> 35656.366 | Lr --> 0.005 | Seconds_per_step --> 2.031 | [2024-07-30 14:30:51,870][Main][INFO] - [train] Step 44800 out of 120000 | Loss --> 2.263 | Grad_l2 --> 0.370 | Weights_l2 --> 35688.042 | Lr --> 0.005 | Seconds_per_step --> 2.022 | [2024-07-30 14:34:14,581][Main][INFO] - [train] Step 44900 out of 120000 | Loss --> 2.276 | Grad_l2 --> 0.373 | Weights_l2 --> 35719.578 | Lr --> 0.005 | Seconds_per_step --> 2.027 | [2024-07-30 14:37:37,397][Main][INFO] - [train] Step 45000 out of 120000 | Loss --> 2.284 | Grad_l2 --> 0.376 | Weights_l2 --> 35751.145 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 14:48:35,808][Main][INFO] - [eval] Step 45000 out of 120000 | Loss --> 2.343 | Accuracy --> 0.587 | Time --> 658.408 | [2024-07-30 14:51:57,447][Main][INFO] - [train] Step 45100 out of 120000 | Loss --> 2.292 | Grad_l2 --> 0.374 | Weights_l2 --> 35783.176 | Lr --> 0.005 | Seconds_per_step --> 2.016 | [2024-07-30 14:55:20,754][Main][INFO] - [train] Step 45200 out of 120000 | Loss --> 2.299 | Grad_l2 --> 0.375 | Weights_l2 --> 35815.425 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 14:58:43,353][Main][INFO] - [train] Step 45300 out of 120000 | Loss --> 2.308 | Grad_l2 --> 0.385 | Weights_l2 --> 35847.128 | Lr --> 0.005 | Seconds_per_step --> 2.026 | [2024-07-30 15:02:04,838][Main][INFO] - [train] Step 45400 out of 120000 | Loss --> 2.292 | Grad_l2 --> 0.380 | Weights_l2 --> 35878.997 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 15:05:28,096][Main][INFO] - [train] Step 45500 out of 120000 | Loss --> 2.272 | Grad_l2 --> 0.372 | Weights_l2 --> 35910.857 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 15:08:50,538][Main][INFO] - [train] Step 45600 out of 120000 | Loss --> 2.285 | Grad_l2 --> 0.373 | Weights_l2 --> 35942.465 | Lr --> 0.005 | Seconds_per_step --> 2.024 | [2024-07-30 15:12:10,893][Main][INFO] - [train] Step 45700 out of 120000 | Loss --> 2.288 | Grad_l2 --> 0.379 | Weights_l2 --> 35974.201 | Lr --> 0.005 | Seconds_per_step --> 2.004 | [2024-07-30 15:15:34,235][Main][INFO] - [train] Step 45800 out of 120000 | Loss --> 2.293 | Grad_l2 --> 0.371 | Weights_l2 --> 36005.963 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 15:18:55,435][Main][INFO] - [train] Step 45900 out of 120000 | Loss --> 2.285 | Grad_l2 --> 0.380 | Weights_l2 --> 36037.435 | Lr --> 0.005 | Seconds_per_step --> 2.012 | [2024-07-30 15:22:16,491][Main][INFO] - [train] Step 46000 out of 120000 | Loss --> 2.288 | Grad_l2 --> 0.376 | Weights_l2 --> 36068.758 | Lr --> 0.005 | Seconds_per_step --> 2.011 | [2024-07-30 15:25:39,771][Main][INFO] - [train] Step 46100 out of 120000 | Loss --> 2.285 | Grad_l2 --> 0.377 | Weights_l2 --> 36099.946 | Lr --> 0.005 | Seconds_per_step --> 2.033 | [2024-07-30 15:29:01,300][Main][INFO] - [train] Step 46200 out of 120000 | Loss --> 2.296 | Grad_l2 --> 0.375 | Weights_l2 --> 36131.248 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 15:32:23,260][Main][INFO] - [train] Step 46300 out of 120000 | Loss --> 2.291 | Grad_l2 --> 0.371 | Weights_l2 --> 36162.355 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 15:35:46,108][Main][INFO] - [train] Step 46400 out of 120000 | Loss --> 2.281 | Grad_l2 --> 0.373 | Weights_l2 --> 36192.956 | Lr --> 0.005 | Seconds_per_step --> 2.028 | [2024-07-30 15:39:07,035][Main][INFO] - [train] Step 46500 out of 120000 | Loss --> 2.256 | Grad_l2 --> 0.375 | Weights_l2 --> 36223.351 | Lr --> 0.005 | Seconds_per_step --> 2.009 | [2024-07-30 15:42:27,037][Main][INFO] - [train] Step 46600 out of 120000 | Loss --> 2.266 | Grad_l2 --> 0.372 | Weights_l2 --> 36253.947 | Lr --> 0.005 | Seconds_per_step --> 2.000 | [2024-07-30 15:45:49,040][Main][INFO] - [train] Step 46700 out of 120000 | Loss --> 2.260 | Grad_l2 --> 0.371 | Weights_l2 --> 36283.844 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 15:49:12,607][Main][INFO] - [train] Step 46800 out of 120000 | Loss --> 2.267 | Grad_l2 --> 0.476 | Weights_l2 --> 36313.657 | Lr --> 0.005 | Seconds_per_step --> 2.036 | [2024-07-30 15:52:32,746][Main][INFO] - [train] Step 46900 out of 120000 | Loss --> 2.239 | Grad_l2 --> 0.376 | Weights_l2 --> 36344.099 | Lr --> 0.005 | Seconds_per_step --> 2.001 | [2024-07-30 15:55:55,919][Main][INFO] - [train] Step 47000 out of 120000 | Loss --> 2.243 | Grad_l2 --> 0.371 | Weights_l2 --> 36373.803 | Lr --> 0.005 | Seconds_per_step --> 2.032 | [2024-07-30 15:59:16,323][Main][INFO] - [train] Step 47100 out of 120000 | Loss --> 2.239 | Grad_l2 --> 0.373 | Weights_l2 --> 36403.887 | Lr --> 0.005 | Seconds_per_step --> 2.004 | [2024-07-30 16:02:37,304][Main][INFO] - [train] Step 47200 out of 120000 | Loss --> 2.228 | Grad_l2 --> 0.369 | Weights_l2 --> 36433.673 | Lr --> 0.005 | Seconds_per_step --> 2.010 | [2024-07-30 16:05:58,961][Main][INFO] - [train] Step 47300 out of 120000 | Loss --> 2.233 | Grad_l2 --> 0.372 | Weights_l2 --> 36463.472 | Lr --> 0.005 | Seconds_per_step --> 2.017 | [2024-07-30 16:09:19,544][Main][INFO] - [train] Step 47400 out of 120000 | Loss --> 2.232 | Grad_l2 --> 0.370 | Weights_l2 --> 36493.831 | Lr --> 0.005 | Seconds_per_step --> 2.006 | [2024-07-30 16:12:41,582][Main][INFO] - [train] Step 47500 out of 120000 | Loss --> 2.225 | Grad_l2 --> 0.372 | Weights_l2 --> 36523.674 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 16:16:02,385][Main][INFO] - [train] Step 47600 out of 120000 | Loss --> 2.241 | Grad_l2 --> 0.376 | Weights_l2 --> 36553.225 | Lr --> 0.005 | Seconds_per_step --> 2.008 | [2024-07-30 16:19:24,075][Main][INFO] - [train] Step 47700 out of 120000 | Loss --> 2.241 | Grad_l2 --> 0.378 | Weights_l2 --> 36582.959 | Lr --> 0.005 | Seconds_per_step --> 2.017 | [2024-07-30 16:22:44,265][Main][INFO] - [train] Step 47800 out of 120000 | Loss --> 2.250 | Grad_l2 --> 0.376 | Weights_l2 --> 36613.317 | Lr --> 0.005 | Seconds_per_step --> 2.002 | [2024-07-30 16:26:07,491][Main][INFO] - [train] Step 47900 out of 120000 | Loss --> 2.249 | Grad_l2 --> 0.369 | Weights_l2 --> 36643.107 | Lr --> 0.005 | Seconds_per_step --> 2.032 | [2024-07-30 16:29:28,936][Main][INFO] - [train] Step 48000 out of 120000 | Loss --> 2.247 | Grad_l2 --> 0.377 | Weights_l2 --> 36672.865 | Lr --> 0.005 | Seconds_per_step --> 2.014 | [2024-07-30 16:32:50,424][Main][INFO] - [train] Step 48100 out of 120000 | Loss --> 2.249 | Grad_l2 --> 0.368 | Weights_l2 --> 36703.091 | Lr --> 0.005 | Seconds_per_step --> 2.015 | [2024-07-30 16:36:11,670][Main][INFO] - [train] Step 48200 out of 120000 | Loss --> 2.257 | Grad_l2 --> 0.372 | Weights_l2 --> 36733.366 | Lr --> 0.005 | Seconds_per_step --> 2.012 | [2024-07-30 16:39:32,297][Main][INFO] - [train] Step 48300 out of 120000 | Loss --> 2.268 | Grad_l2 --> 0.375 | Weights_l2 --> 36763.185 | Lr --> 0.005 | Seconds_per_step --> 2.006 | [2024-07-30 16:42:53,257][Main][INFO] - [train] Step 48400 out of 120000 | Loss --> 2.286 | Grad_l2 --> 0.379 | Weights_l2 --> 36793.096 | Lr --> 0.005 | Seconds_per_step --> 2.010 | [2024-07-30 16:46:16,652][Main][INFO] - [train] Step 48500 out of 120000 | Loss --> 2.287 | Grad_l2 --> 0.371 | Weights_l2 --> 36822.811 | Lr --> 0.005 | Seconds_per_step --> 2.034 | [2024-07-30 16:49:37,077][Main][INFO] - [train] Step 48600 out of 120000 | Loss --> 2.281 | Grad_l2 --> 0.376 | Weights_l2 --> 36852.560 | Lr --> 0.005 | Seconds_per_step --> 2.004 | [2024-07-30 16:52:57,226][Main][INFO] - [train] Step 48700 out of 120000 | Loss --> 2.293 | Grad_l2 --> 0.362 | Weights_l2 --> 36882.821 | Lr --> 0.005 | Seconds_per_step --> 2.001 | [2024-07-30 16:56:19,635][Main][INFO] - [train] Step 48800 out of 120000 | Loss --> 2.290 | Grad_l2 --> 0.373 | Weights_l2 --> 36913.004 | Lr --> 0.005 | Seconds_per_step --> 2.024 | [2024-07-30 16:59:41,673][Main][INFO] - [train] Step 48900 out of 120000 | Loss --> 2.277 | Grad_l2 --> 0.373 | Weights_l2 --> 36942.391 | Lr --> 0.005 | Seconds_per_step --> 2.020 | [2024-07-30 17:03:02,479][Main][INFO] - [train] Step 49000 out of 120000 | Loss --> 2.279 | Grad_l2 --> 0.376 | Weights_l2 --> 36972.222 | Lr --> 0.005 | Seconds_per_step --> 2.008 | [2024-07-30 17:06:22,746][Main][INFO] - [train] Step 49100 out of 120000 | Loss --> 2.279 | Grad_l2 --> 0.373 | Weights_l2 --> 37001.718 | Lr --> 0.005 | Seconds_per_step --> 2.003 | [2024-07-30 17:09:44,694][Main][INFO] - [train] Step 49200 out of 120000 | Loss --> 2.289 | Grad_l2 --> 0.371 | Weights_l2 --> 37031.392 | Lr --> 0.005 | Seconds_per_step --> 2.019 | [2024-07-30 17:13:07,354][Main][INFO] - [train] Step 49300 out of 120000 | Loss --> 2.299 | Grad_l2 --> 0.371 | Weights_l2 --> 37060.814 | Lr --> 0.005 | Seconds_per_step --> 2.027 | [2024-07-30 17:16:29,472][Main][INFO] - [train] Step 49400 out of 120000 | Loss --> 2.281 | Grad_l2 --> 0.372 | Weights_l2 --> 37090.412 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-30 17:19:50,874][Main][INFO] - [train] Step 49500 out of 120000 | Loss --> 2.280 | Grad_l2 --> 0.377 | Weights_l2 --> 37120.030 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-30 17:23:14,209][Main][INFO] - [train] Step 49600 out of 120000 | Loss --> 2.267 | Grad_l2 --> 0.374 | Weights_l2 --> 37149.541 | Lr --> 0.004 | Seconds_per_step --> 2.033 | [2024-07-30 17:26:38,563][Main][INFO] - [train] Step 49700 out of 120000 | Loss --> 2.252 | Grad_l2 --> 0.372 | Weights_l2 --> 37179.303 | Lr --> 0.004 | Seconds_per_step --> 2.044 | [2024-07-30 17:29:58,886][Main][INFO] - [train] Step 49800 out of 120000 | Loss --> 2.249 | Grad_l2 --> 0.376 | Weights_l2 --> 37209.395 | Lr --> 0.004 | Seconds_per_step --> 2.003 | [2024-07-30 17:33:24,227][Main][INFO] - [train] Step 49900 out of 120000 | Loss --> 2.247 | Grad_l2 --> 0.366 | Weights_l2 --> 37240.003 | Lr --> 0.004 | Seconds_per_step --> 2.053 | [2024-07-30 17:36:45,992][Main][INFO] - [train] Step 50000 out of 120000 | Loss --> 2.231 | Grad_l2 --> 0.367 | Weights_l2 --> 37270.050 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-30 17:47:48,881][Main][INFO] - [eval] Step 50000 out of 120000 | Loss --> 2.302 | Accuracy --> 0.591 | Time --> 662.887 | [2024-07-30 17:47:48,889][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-50000 [2024-07-30 17:47:48,893][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-30 17:47:52,159][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-50000/model.safetensors [2024-07-30 17:47:52,211][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-50000/optimizer.bin [2024-07-30 17:47:52,212][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-50000/scheduler.bin [2024-07-30 17:47:52,212][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-50000/sampler.bin [2024-07-30 17:47:52,212][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-50000/sampler_1.bin [2024-07-30 17:47:52,213][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-50000/random_states_0.pkl [2024-07-30 17:51:14,003][Main][INFO] - [train] Step 50100 out of 120000 | Loss --> 2.234 | Grad_l2 --> 0.374 | Weights_l2 --> 37300.363 | Lr --> 0.004 | Seconds_per_step --> 2.051 | [2024-07-30 17:54:38,321][Main][INFO] - [train] Step 50200 out of 120000 | Loss --> 2.227 | Grad_l2 --> 0.371 | Weights_l2 --> 37330.546 | Lr --> 0.004 | Seconds_per_step --> 2.043 | [2024-07-30 17:57:59,738][Main][INFO] - [train] Step 50300 out of 120000 | Loss --> 2.236 | Grad_l2 --> 0.366 | Weights_l2 --> 37360.862 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-30 18:01:21,592][Main][INFO] - [train] Step 50400 out of 120000 | Loss --> 2.227 | Grad_l2 --> 0.376 | Weights_l2 --> 37390.820 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-30 18:04:44,960][Main][INFO] - [train] Step 50500 out of 120000 | Loss --> 2.227 | Grad_l2 --> 0.374 | Weights_l2 --> 37420.975 | Lr --> 0.004 | Seconds_per_step --> 2.034 | [2024-07-30 18:08:08,960][Main][INFO] - [train] Step 50600 out of 120000 | Loss --> 2.220 | Grad_l2 --> 0.373 | Weights_l2 --> 37451.246 | Lr --> 0.004 | Seconds_per_step --> 2.040 | [2024-07-30 18:11:30,939][Main][INFO] - [train] Step 50700 out of 120000 | Loss --> 2.218 | Grad_l2 --> 0.373 | Weights_l2 --> 37481.491 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-30 18:14:57,253][Main][INFO] - [train] Step 50800 out of 120000 | Loss --> 2.235 | Grad_l2 --> 0.371 | Weights_l2 --> 37511.729 | Lr --> 0.004 | Seconds_per_step --> 2.063 | [2024-07-30 18:18:18,754][Main][INFO] - [train] Step 50900 out of 120000 | Loss --> 2.235 | Grad_l2 --> 0.372 | Weights_l2 --> 37542.187 | Lr --> 0.004 | Seconds_per_step --> 2.015 | [2024-07-30 18:21:41,546][Main][INFO] - [train] Step 51000 out of 120000 | Loss --> 2.251 | Grad_l2 --> 0.373 | Weights_l2 --> 37572.573 | Lr --> 0.004 | Seconds_per_step --> 2.028 | [2024-07-30 18:25:04,972][Main][INFO] - [train] Step 51100 out of 120000 | Loss --> 2.269 | Grad_l2 --> 0.372 | Weights_l2 --> 37603.124 | Lr --> 0.004 | Seconds_per_step --> 2.034 | [2024-07-30 18:28:27,161][Main][INFO] - [train] Step 51200 out of 120000 | Loss --> 2.278 | Grad_l2 --> 0.375 | Weights_l2 --> 37633.079 | Lr --> 0.004 | Seconds_per_step --> 2.022 | [2024-07-30 18:31:49,356][Main][INFO] - [train] Step 51300 out of 120000 | Loss --> 2.263 | Grad_l2 --> 0.374 | Weights_l2 --> 37663.081 | Lr --> 0.004 | Seconds_per_step --> 2.022 | [2024-07-30 18:35:12,293][Main][INFO] - [train] Step 51400 out of 120000 | Loss --> 2.271 | Grad_l2 --> 0.376 | Weights_l2 --> 37693.233 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-30 18:38:32,259][Main][INFO] - [train] Step 51500 out of 120000 | Loss --> 2.272 | Grad_l2 --> 0.370 | Weights_l2 --> 37722.954 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-30 18:41:54,182][Main][INFO] - [train] Step 51600 out of 120000 | Loss --> 2.284 | Grad_l2 --> 0.375 | Weights_l2 --> 37752.592 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-30 18:45:17,904][Main][INFO] - [train] Step 51700 out of 120000 | Loss --> 2.287 | Grad_l2 --> 0.379 | Weights_l2 --> 37782.113 | Lr --> 0.004 | Seconds_per_step --> 2.037 | [2024-07-30 18:48:39,155][Main][INFO] - [train] Step 51800 out of 120000 | Loss --> 2.296 | Grad_l2 --> 0.380 | Weights_l2 --> 37811.733 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-30 18:51:59,885][Main][INFO] - [train] Step 51900 out of 120000 | Loss --> 2.288 | Grad_l2 --> 0.380 | Weights_l2 --> 37841.146 | Lr --> 0.004 | Seconds_per_step --> 2.007 | [2024-07-30 18:55:23,798][Main][INFO] - [train] Step 52000 out of 120000 | Loss --> 2.305 | Grad_l2 --> 0.373 | Weights_l2 --> 37870.184 | Lr --> 0.004 | Seconds_per_step --> 2.039 | [2024-07-30 18:58:43,533][Main][INFO] - [train] Step 52100 out of 120000 | Loss --> 2.305 | Grad_l2 --> 0.371 | Weights_l2 --> 37899.354 | Lr --> 0.004 | Seconds_per_step --> 1.997 | [2024-07-30 19:02:04,562][Main][INFO] - [train] Step 52200 out of 120000 | Loss --> 2.325 | Grad_l2 --> 0.380 | Weights_l2 --> 37928.725 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-30 19:05:28,104][Main][INFO] - [train] Step 52300 out of 120000 | Loss --> 2.314 | Grad_l2 --> 0.373 | Weights_l2 --> 37957.827 | Lr --> 0.004 | Seconds_per_step --> 2.035 | [2024-07-30 19:08:47,978][Main][INFO] - [train] Step 52400 out of 120000 | Loss --> 2.316 | Grad_l2 --> 0.376 | Weights_l2 --> 37986.833 | Lr --> 0.004 | Seconds_per_step --> 1.999 | [2024-07-30 19:12:09,875][Main][INFO] - [train] Step 52500 out of 120000 | Loss --> 2.299 | Grad_l2 --> 0.374 | Weights_l2 --> 38015.697 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-30 19:15:30,679][Main][INFO] - [train] Step 52600 out of 120000 | Loss --> 2.299 | Grad_l2 --> 0.376 | Weights_l2 --> 38044.471 | Lr --> 0.004 | Seconds_per_step --> 2.008 | [2024-07-30 19:18:51,635][Main][INFO] - [train] Step 52700 out of 120000 | Loss --> 2.298 | Grad_l2 --> 0.375 | Weights_l2 --> 38073.017 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-30 19:22:15,189][Main][INFO] - [train] Step 52800 out of 120000 | Loss --> 2.285 | Grad_l2 --> 0.370 | Weights_l2 --> 38101.507 | Lr --> 0.004 | Seconds_per_step --> 2.036 | [2024-07-30 19:25:35,054][Main][INFO] - [train] Step 52900 out of 120000 | Loss --> 2.278 | Grad_l2 --> 0.373 | Weights_l2 --> 38130.046 | Lr --> 0.004 | Seconds_per_step --> 1.999 | [2024-07-30 19:28:55,094][Main][INFO] - [train] Step 53000 out of 120000 | Loss --> 2.269 | Grad_l2 --> 0.376 | Weights_l2 --> 38158.989 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-30 19:32:18,106][Main][INFO] - [train] Step 53100 out of 120000 | Loss --> 2.255 | Grad_l2 --> 0.372 | Weights_l2 --> 38187.343 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-30 19:35:38,376][Main][INFO] - [train] Step 53200 out of 120000 | Loss --> 2.248 | Grad_l2 --> 0.371 | Weights_l2 --> 38215.704 | Lr --> 0.004 | Seconds_per_step --> 2.003 | [2024-07-30 19:39:01,308][Main][INFO] - [train] Step 53300 out of 120000 | Loss --> 2.248 | Grad_l2 --> 0.372 | Weights_l2 --> 38244.336 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-30 19:42:21,152][Main][INFO] - [train] Step 53400 out of 120000 | Loss --> 2.241 | Grad_l2 --> 0.372 | Weights_l2 --> 38272.434 | Lr --> 0.004 | Seconds_per_step --> 1.998 | [2024-07-30 19:45:42,736][Main][INFO] - [train] Step 53500 out of 120000 | Loss --> 2.244 | Grad_l2 --> 0.379 | Weights_l2 --> 38300.877 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-30 19:49:05,347][Main][INFO] - [train] Step 53600 out of 120000 | Loss --> 2.242 | Grad_l2 --> 0.372 | Weights_l2 --> 38328.624 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-30 19:52:25,890][Main][INFO] - [train] Step 53700 out of 120000 | Loss --> 2.251 | Grad_l2 --> 0.375 | Weights_l2 --> 38357.542 | Lr --> 0.004 | Seconds_per_step --> 2.005 | [2024-07-30 19:55:47,792][Main][INFO] - [train] Step 53800 out of 120000 | Loss --> 2.245 | Grad_l2 --> 0.371 | Weights_l2 --> 38385.811 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-30 19:59:11,561][Main][INFO] - [train] Step 53900 out of 120000 | Loss --> 2.246 | Grad_l2 --> 0.372 | Weights_l2 --> 38413.762 | Lr --> 0.004 | Seconds_per_step --> 2.038 | [2024-07-30 20:02:30,745][Main][INFO] - [train] Step 54000 out of 120000 | Loss --> 2.233 | Grad_l2 --> 0.373 | Weights_l2 --> 38441.996 | Lr --> 0.004 | Seconds_per_step --> 1.992 | [2024-07-30 20:05:53,021][Main][INFO] - [train] Step 54100 out of 120000 | Loss --> 2.253 | Grad_l2 --> 0.375 | Weights_l2 --> 38469.705 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-30 20:09:14,850][Main][INFO] - [train] Step 54200 out of 120000 | Loss --> 2.234 | Grad_l2 --> 0.375 | Weights_l2 --> 38497.454 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-30 20:12:36,487][Main][INFO] - [train] Step 54300 out of 120000 | Loss --> 2.229 | Grad_l2 --> 0.371 | Weights_l2 --> 38524.856 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-30 20:15:54,970][Main][INFO] - [train] Step 54400 out of 120000 | Loss --> 2.230 | Grad_l2 --> 0.372 | Weights_l2 --> 38552.158 | Lr --> 0.004 | Seconds_per_step --> 1.985 | [2024-07-30 20:19:20,131][Main][INFO] - [train] Step 54500 out of 120000 | Loss --> 2.216 | Grad_l2 --> 0.378 | Weights_l2 --> 38579.701 | Lr --> 0.004 | Seconds_per_step --> 2.052 | [2024-07-30 20:22:41,206][Main][INFO] - [train] Step 54600 out of 120000 | Loss --> 2.219 | Grad_l2 --> 0.369 | Weights_l2 --> 38607.046 | Lr --> 0.004 | Seconds_per_step --> 2.011 | [2024-07-30 20:26:02,533][Main][INFO] - [train] Step 54700 out of 120000 | Loss --> 2.219 | Grad_l2 --> 0.370 | Weights_l2 --> 38634.725 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-30 20:29:25,935][Main][INFO] - [train] Step 54800 out of 120000 | Loss --> 2.216 | Grad_l2 --> 0.372 | Weights_l2 --> 38662.175 | Lr --> 0.004 | Seconds_per_step --> 2.034 | [2024-07-30 20:32:45,352][Main][INFO] - [train] Step 54900 out of 120000 | Loss --> 2.208 | Grad_l2 --> 0.370 | Weights_l2 --> 38689.484 | Lr --> 0.004 | Seconds_per_step --> 1.994 | [2024-07-30 20:36:07,195][Main][INFO] - [train] Step 55000 out of 120000 | Loss --> 2.206 | Grad_l2 --> 0.369 | Weights_l2 --> 38717.016 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-30 20:47:02,919][Main][INFO] - [eval] Step 55000 out of 120000 | Loss --> 2.267 | Accuracy --> 0.596 | Time --> 655.721 | [2024-07-30 20:50:23,482][Main][INFO] - [train] Step 55100 out of 120000 | Loss --> 2.208 | Grad_l2 --> 0.368 | Weights_l2 --> 38744.131 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-30 20:53:44,823][Main][INFO] - [train] Step 55200 out of 120000 | Loss --> 2.214 | Grad_l2 --> 0.371 | Weights_l2 --> 38771.204 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-30 20:57:04,281][Main][INFO] - [train] Step 55300 out of 120000 | Loss --> 2.214 | Grad_l2 --> 0.374 | Weights_l2 --> 38798.600 | Lr --> 0.004 | Seconds_per_step --> 1.995 | [2024-07-30 21:00:28,853][Main][INFO] - [train] Step 55400 out of 120000 | Loss --> 2.204 | Grad_l2 --> 0.375 | Weights_l2 --> 38825.557 | Lr --> 0.004 | Seconds_per_step --> 2.046 | [2024-07-30 21:03:48,855][Main][INFO] - [train] Step 55500 out of 120000 | Loss --> 2.222 | Grad_l2 --> 0.371 | Weights_l2 --> 38852.861 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-30 21:07:11,107][Main][INFO] - [train] Step 55600 out of 120000 | Loss --> 2.210 | Grad_l2 --> 0.369 | Weights_l2 --> 38879.812 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-30 21:10:32,999][Main][INFO] - [train] Step 55700 out of 120000 | Loss --> 2.225 | Grad_l2 --> 0.376 | Weights_l2 --> 38907.227 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-30 21:13:52,570][Main][INFO] - [train] Step 55800 out of 120000 | Loss --> 2.234 | Grad_l2 --> 0.376 | Weights_l2 --> 38934.419 | Lr --> 0.004 | Seconds_per_step --> 1.996 | [2024-07-30 21:17:14,835][Main][INFO] - [train] Step 55900 out of 120000 | Loss --> 2.225 | Grad_l2 --> 0.370 | Weights_l2 --> 38961.858 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-30 21:20:35,489][Main][INFO] - [train] Step 56000 out of 120000 | Loss --> 2.236 | Grad_l2 --> 0.367 | Weights_l2 --> 38988.746 | Lr --> 0.004 | Seconds_per_step --> 2.007 | [2024-07-30 21:23:55,860][Main][INFO] - [train] Step 56100 out of 120000 | Loss --> 2.236 | Grad_l2 --> 0.369 | Weights_l2 --> 39016.098 | Lr --> 0.004 | Seconds_per_step --> 2.004 | [2024-07-30 21:27:18,561][Main][INFO] - [train] Step 56200 out of 120000 | Loss --> 2.231 | Grad_l2 --> 0.374 | Weights_l2 --> 39043.354 | Lr --> 0.004 | Seconds_per_step --> 2.027 | [2024-07-30 21:30:41,119][Main][INFO] - [train] Step 56300 out of 120000 | Loss --> 2.238 | Grad_l2 --> 0.370 | Weights_l2 --> 39070.247 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-30 21:34:02,537][Main][INFO] - [train] Step 56400 out of 120000 | Loss --> 2.216 | Grad_l2 --> 0.372 | Weights_l2 --> 39097.013 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-30 21:37:24,720][Main][INFO] - [train] Step 56500 out of 120000 | Loss --> 2.215 | Grad_l2 --> 0.372 | Weights_l2 --> 39124.268 | Lr --> 0.004 | Seconds_per_step --> 2.022 | [2024-07-30 21:40:46,852][Main][INFO] - [train] Step 56600 out of 120000 | Loss --> 2.226 | Grad_l2 --> 0.369 | Weights_l2 --> 39151.176 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-30 21:44:06,823][Main][INFO] - [train] Step 56700 out of 120000 | Loss --> 2.222 | Grad_l2 --> 0.370 | Weights_l2 --> 39178.418 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-30 21:47:29,223][Main][INFO] - [train] Step 56800 out of 120000 | Loss --> 2.211 | Grad_l2 --> 0.371 | Weights_l2 --> 39205.711 | Lr --> 0.004 | Seconds_per_step --> 2.024 | [2024-07-30 21:50:51,104][Main][INFO] - [train] Step 56900 out of 120000 | Loss --> 2.222 | Grad_l2 --> 0.377 | Weights_l2 --> 39232.850 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-30 21:55:12,437][Main][INFO] - [train] Step 57000 out of 120000 | Loss --> 2.216 | Grad_l2 --> 0.367 | Weights_l2 --> 39259.950 | Lr --> 0.004 | Seconds_per_step --> 2.613 | [2024-07-30 21:58:33,040][Main][INFO] - [train] Step 57100 out of 120000 | Loss --> 2.211 | Grad_l2 --> 0.367 | Weights_l2 --> 39287.307 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-30 22:01:52,148][Main][INFO] - [train] Step 57200 out of 120000 | Loss --> 2.194 | Grad_l2 --> 0.380 | Weights_l2 --> 39314.615 | Lr --> 0.004 | Seconds_per_step --> 1.991 | [2024-07-30 22:05:11,397][Main][INFO] - [train] Step 57300 out of 120000 | Loss --> 2.186 | Grad_l2 --> 0.373 | Weights_l2 --> 39341.266 | Lr --> 0.004 | Seconds_per_step --> 1.992 | [2024-07-30 22:08:33,855][Main][INFO] - [train] Step 57400 out of 120000 | Loss --> 2.174 | Grad_l2 --> 0.377 | Weights_l2 --> 39368.194 | Lr --> 0.004 | Seconds_per_step --> 2.025 | [2024-07-30 22:11:55,104][Main][INFO] - [train] Step 57500 out of 120000 | Loss --> 2.157 | Grad_l2 --> 0.370 | Weights_l2 --> 39394.858 | Lr --> 0.004 | Seconds_per_step --> 2.012 | [2024-07-30 22:15:18,524][Main][INFO] - [train] Step 57600 out of 120000 | Loss --> 2.134 | Grad_l2 --> 0.379 | Weights_l2 --> 39421.793 | Lr --> 0.004 | Seconds_per_step --> 2.034 | [2024-07-30 22:18:39,927][Main][INFO] - [train] Step 57700 out of 120000 | Loss --> 2.121 | Grad_l2 --> 0.372 | Weights_l2 --> 39448.308 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-30 22:21:57,870][Main][INFO] - [train] Step 57800 out of 120000 | Loss --> 2.117 | Grad_l2 --> 0.373 | Weights_l2 --> 39475.094 | Lr --> 0.004 | Seconds_per_step --> 1.979 | [2024-07-30 22:25:22,871][Main][INFO] - [train] Step 57900 out of 120000 | Loss --> 2.113 | Grad_l2 --> 0.365 | Weights_l2 --> 39501.856 | Lr --> 0.004 | Seconds_per_step --> 2.050 | [2024-07-30 22:28:41,796][Main][INFO] - [train] Step 58000 out of 120000 | Loss --> 2.119 | Grad_l2 --> 0.368 | Weights_l2 --> 39528.477 | Lr --> 0.004 | Seconds_per_step --> 1.989 | [2024-07-30 22:32:02,549][Main][INFO] - [train] Step 58100 out of 120000 | Loss --> 2.120 | Grad_l2 --> 0.367 | Weights_l2 --> 39555.357 | Lr --> 0.004 | Seconds_per_step --> 2.008 | [2024-07-30 22:35:25,784][Main][INFO] - [train] Step 58200 out of 120000 | Loss --> 2.115 | Grad_l2 --> 0.362 | Weights_l2 --> 39582.089 | Lr --> 0.004 | Seconds_per_step --> 2.032 | [2024-07-30 22:38:45,076][Main][INFO] - [train] Step 58300 out of 120000 | Loss --> 2.119 | Grad_l2 --> 0.374 | Weights_l2 --> 39608.843 | Lr --> 0.004 | Seconds_per_step --> 1.993 | [2024-07-30 22:42:02,935][Main][INFO] - [train] Step 58400 out of 120000 | Loss --> 2.111 | Grad_l2 --> 0.374 | Weights_l2 --> 39635.784 | Lr --> 0.004 | Seconds_per_step --> 1.979 | [2024-07-30 22:45:27,153][Main][INFO] - [train] Step 58500 out of 120000 | Loss --> 2.122 | Grad_l2 --> 0.367 | Weights_l2 --> 39662.759 | Lr --> 0.004 | Seconds_per_step --> 2.042 | [2024-07-30 22:48:48,106][Main][INFO] - [train] Step 58600 out of 120000 | Loss --> 2.132 | Grad_l2 --> 0.366 | Weights_l2 --> 39690.216 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-30 22:52:10,554][Main][INFO] - [train] Step 58700 out of 120000 | Loss --> 2.137 | Grad_l2 --> 0.369 | Weights_l2 --> 39717.770 | Lr --> 0.004 | Seconds_per_step --> 2.024 | [2024-07-30 22:55:32,892][Main][INFO] - [train] Step 58800 out of 120000 | Loss --> 2.148 | Grad_l2 --> 0.378 | Weights_l2 --> 39745.622 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-30 22:58:54,594][Main][INFO] - [train] Step 58900 out of 120000 | Loss --> 2.149 | Grad_l2 --> 0.372 | Weights_l2 --> 39772.857 | Lr --> 0.004 | Seconds_per_step --> 2.017 | [2024-07-30 23:02:17,499][Main][INFO] - [train] Step 59000 out of 120000 | Loss --> 2.165 | Grad_l2 --> 0.368 | Weights_l2 --> 39800.480 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-30 23:05:39,136][Main][INFO] - [train] Step 59100 out of 120000 | Loss --> 2.162 | Grad_l2 --> 0.366 | Weights_l2 --> 39827.839 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-30 23:08:59,548][Main][INFO] - [train] Step 59200 out of 120000 | Loss --> 2.160 | Grad_l2 --> 0.370 | Weights_l2 --> 39855.464 | Lr --> 0.004 | Seconds_per_step --> 2.004 | [2024-07-30 23:12:22,754][Main][INFO] - [train] Step 59300 out of 120000 | Loss --> 2.179 | Grad_l2 --> 0.366 | Weights_l2 --> 39883.364 | Lr --> 0.004 | Seconds_per_step --> 2.032 | [2024-07-30 23:15:44,751][Main][INFO] - [train] Step 59400 out of 120000 | Loss --> 2.191 | Grad_l2 --> 0.370 | Weights_l2 --> 39911.365 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-30 23:19:07,011][Main][INFO] - [train] Step 59500 out of 120000 | Loss --> 2.203 | Grad_l2 --> 0.370 | Weights_l2 --> 39939.245 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-30 23:22:30,616][Main][INFO] - [train] Step 59600 out of 120000 | Loss --> 2.183 | Grad_l2 --> 0.373 | Weights_l2 --> 39966.746 | Lr --> 0.004 | Seconds_per_step --> 2.036 | [2024-07-30 23:25:53,651][Main][INFO] - [train] Step 59700 out of 120000 | Loss --> 2.176 | Grad_l2 --> 0.369 | Weights_l2 --> 39994.233 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-30 23:29:13,939][Main][INFO] - [train] Step 59800 out of 120000 | Loss --> 2.184 | Grad_l2 --> 0.371 | Weights_l2 --> 40021.610 | Lr --> 0.004 | Seconds_per_step --> 2.003 | [2024-07-30 23:32:38,834][Main][INFO] - [train] Step 59900 out of 120000 | Loss --> 2.184 | Grad_l2 --> 0.369 | Weights_l2 --> 40048.813 | Lr --> 0.004 | Seconds_per_step --> 2.049 | [2024-07-30 23:35:59,991][Main][INFO] - [train] Step 60000 out of 120000 | Loss --> 2.176 | Grad_l2 --> 0.379 | Weights_l2 --> 40076.540 | Lr --> 0.004 | Seconds_per_step --> 2.012 | [2024-07-30 23:46:59,093][Main][INFO] - [eval] Step 60000 out of 120000 | Loss --> 2.245 | Accuracy --> 0.599 | Time --> 659.099 | [2024-07-30 23:46:59,100][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-60000 [2024-07-30 23:46:59,105][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-30 23:47:02,355][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-60000/model.safetensors [2024-07-30 23:47:02,417][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-60000/optimizer.bin [2024-07-30 23:47:02,426][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-60000/scheduler.bin [2024-07-30 23:47:02,426][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-60000/sampler.bin [2024-07-30 23:47:02,426][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-60000/sampler_1.bin [2024-07-30 23:47:02,436][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-60000/random_states_0.pkl [2024-07-30 23:50:26,353][Main][INFO] - [train] Step 60100 out of 120000 | Loss --> 2.182 | Grad_l2 --> 0.367 | Weights_l2 --> 40103.996 | Lr --> 0.004 | Seconds_per_step --> 2.073 | [2024-07-30 23:53:49,658][Main][INFO] - [train] Step 60200 out of 120000 | Loss --> 2.199 | Grad_l2 --> 0.373 | Weights_l2 --> 40131.363 | Lr --> 0.004 | Seconds_per_step --> 2.033 | [2024-07-30 23:57:09,072][Main][INFO] - [train] Step 60300 out of 120000 | Loss --> 2.197 | Grad_l2 --> 0.371 | Weights_l2 --> 40158.517 | Lr --> 0.004 | Seconds_per_step --> 1.994 | [2024-07-31 00:00:32,246][Main][INFO] - [train] Step 60400 out of 120000 | Loss --> 2.198 | Grad_l2 --> 0.377 | Weights_l2 --> 40185.804 | Lr --> 0.004 | Seconds_per_step --> 2.032 | [2024-07-31 00:03:54,498][Main][INFO] - [train] Step 60500 out of 120000 | Loss --> 2.211 | Grad_l2 --> 0.376 | Weights_l2 --> 40213.067 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 00:07:17,083][Main][INFO] - [train] Step 60600 out of 120000 | Loss --> 2.215 | Grad_l2 --> 0.367 | Weights_l2 --> 40240.556 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-31 00:10:39,207][Main][INFO] - [train] Step 60700 out of 120000 | Loss --> 2.227 | Grad_l2 --> 0.372 | Weights_l2 --> 40267.765 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-31 00:14:03,543][Main][INFO] - [train] Step 60800 out of 120000 | Loss --> 2.223 | Grad_l2 --> 0.381 | Weights_l2 --> 40294.705 | Lr --> 0.004 | Seconds_per_step --> 2.043 | [2024-07-31 00:17:26,471][Main][INFO] - [train] Step 60900 out of 120000 | Loss --> 2.232 | Grad_l2 --> 0.375 | Weights_l2 --> 40321.379 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-31 00:20:48,430][Main][INFO] - [train] Step 61000 out of 120000 | Loss --> 2.241 | Grad_l2 --> 0.374 | Weights_l2 --> 40348.204 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-31 00:24:11,541][Main][INFO] - [train] Step 61100 out of 120000 | Loss --> 2.242 | Grad_l2 --> 0.375 | Weights_l2 --> 40375.182 | Lr --> 0.004 | Seconds_per_step --> 2.031 | [2024-07-31 00:27:33,479][Main][INFO] - [train] Step 61200 out of 120000 | Loss --> 2.239 | Grad_l2 --> 0.374 | Weights_l2 --> 40402.545 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-31 00:30:56,501][Main][INFO] - [train] Step 61300 out of 120000 | Loss --> 2.230 | Grad_l2 --> 0.375 | Weights_l2 --> 40429.064 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-31 00:34:19,285][Main][INFO] - [train] Step 61400 out of 120000 | Loss --> 2.227 | Grad_l2 --> 0.369 | Weights_l2 --> 40455.613 | Lr --> 0.004 | Seconds_per_step --> 2.028 | [2024-07-31 00:37:40,891][Main][INFO] - [train] Step 61500 out of 120000 | Loss --> 2.229 | Grad_l2 --> 0.368 | Weights_l2 --> 40482.364 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-31 00:41:04,182][Main][INFO] - [train] Step 61600 out of 120000 | Loss --> 2.220 | Grad_l2 --> 0.377 | Weights_l2 --> 40509.160 | Lr --> 0.004 | Seconds_per_step --> 2.033 | [2024-07-31 00:44:27,948][Main][INFO] - [train] Step 61700 out of 120000 | Loss --> 2.224 | Grad_l2 --> 0.371 | Weights_l2 --> 40536.115 | Lr --> 0.004 | Seconds_per_step --> 2.038 | [2024-07-31 00:47:50,239][Main][INFO] - [train] Step 61800 out of 120000 | Loss --> 2.213 | Grad_l2 --> 0.372 | Weights_l2 --> 40563.142 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 00:51:14,210][Main][INFO] - [train] Step 61900 out of 120000 | Loss --> 2.209 | Grad_l2 --> 0.372 | Weights_l2 --> 40590.139 | Lr --> 0.004 | Seconds_per_step --> 2.039 | [2024-07-31 00:54:36,463][Main][INFO] - [train] Step 62000 out of 120000 | Loss --> 2.216 | Grad_l2 --> 0.374 | Weights_l2 --> 40617.300 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 00:58:00,023][Main][INFO] - [train] Step 62100 out of 120000 | Loss --> 2.208 | Grad_l2 --> 0.374 | Weights_l2 --> 40644.092 | Lr --> 0.004 | Seconds_per_step --> 2.036 | [2024-07-31 01:01:22,351][Main][INFO] - [train] Step 62200 out of 120000 | Loss --> 2.226 | Grad_l2 --> 0.372 | Weights_l2 --> 40671.224 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 01:04:46,753][Main][INFO] - [train] Step 62300 out of 120000 | Loss --> 2.213 | Grad_l2 --> 0.368 | Weights_l2 --> 40698.209 | Lr --> 0.004 | Seconds_per_step --> 2.044 | [2024-07-31 01:08:08,994][Main][INFO] - [train] Step 62400 out of 120000 | Loss --> 2.228 | Grad_l2 --> 0.372 | Weights_l2 --> 40725.049 | Lr --> 0.004 | Seconds_per_step --> 2.022 | [2024-07-31 01:11:32,853][Main][INFO] - [train] Step 62500 out of 120000 | Loss --> 2.226 | Grad_l2 --> 0.375 | Weights_l2 --> 40751.569 | Lr --> 0.004 | Seconds_per_step --> 2.039 | [2024-07-31 01:14:56,108][Main][INFO] - [train] Step 62600 out of 120000 | Loss --> 2.231 | Grad_l2 --> 0.370 | Weights_l2 --> 40778.500 | Lr --> 0.004 | Seconds_per_step --> 2.033 | [2024-07-31 01:18:14,794][Main][INFO] - [train] Step 62700 out of 120000 | Loss --> 2.246 | Grad_l2 --> 0.378 | Weights_l2 --> 40805.844 | Lr --> 0.004 | Seconds_per_step --> 1.987 | [2024-07-31 01:21:39,822][Main][INFO] - [train] Step 62800 out of 120000 | Loss --> 2.225 | Grad_l2 --> 0.369 | Weights_l2 --> 40833.144 | Lr --> 0.004 | Seconds_per_step --> 2.050 | [2024-07-31 01:25:00,772][Main][INFO] - [train] Step 62900 out of 120000 | Loss --> 2.235 | Grad_l2 --> 0.371 | Weights_l2 --> 40860.525 | Lr --> 0.004 | Seconds_per_step --> 2.009 | [2024-07-31 01:28:24,174][Main][INFO] - [train] Step 63000 out of 120000 | Loss --> 2.248 | Grad_l2 --> 0.370 | Weights_l2 --> 40887.564 | Lr --> 0.004 | Seconds_per_step --> 2.034 | [2024-07-31 01:31:45,990][Main][INFO] - [train] Step 63100 out of 120000 | Loss --> 2.246 | Grad_l2 --> 0.375 | Weights_l2 --> 40914.838 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-31 01:35:07,656][Main][INFO] - [train] Step 63200 out of 120000 | Loss --> 2.247 | Grad_l2 --> 0.379 | Weights_l2 --> 40941.770 | Lr --> 0.004 | Seconds_per_step --> 2.017 | [2024-07-31 01:38:32,902][Main][INFO] - [train] Step 63300 out of 120000 | Loss --> 2.254 | Grad_l2 --> 0.377 | Weights_l2 --> 40968.832 | Lr --> 0.004 | Seconds_per_step --> 2.052 | [2024-07-31 01:41:53,002][Main][INFO] - [train] Step 63400 out of 120000 | Loss --> 2.253 | Grad_l2 --> 0.378 | Weights_l2 --> 40995.939 | Lr --> 0.004 | Seconds_per_step --> 2.001 | [2024-07-31 01:45:13,095][Main][INFO] - [train] Step 63500 out of 120000 | Loss --> 2.264 | Grad_l2 --> 0.378 | Weights_l2 --> 41023.429 | Lr --> 0.004 | Seconds_per_step --> 2.001 | [2024-07-31 01:48:38,284][Main][INFO] - [train] Step 63600 out of 120000 | Loss --> 2.259 | Grad_l2 --> 0.377 | Weights_l2 --> 41050.044 | Lr --> 0.004 | Seconds_per_step --> 2.052 | [2024-07-31 01:51:59,446][Main][INFO] - [train] Step 63700 out of 120000 | Loss --> 2.249 | Grad_l2 --> 0.376 | Weights_l2 --> 41076.147 | Lr --> 0.004 | Seconds_per_step --> 2.012 | [2024-07-31 01:55:20,402][Main][INFO] - [train] Step 63800 out of 120000 | Loss --> 2.233 | Grad_l2 --> 0.372 | Weights_l2 --> 41102.247 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-31 01:58:45,546][Main][INFO] - [train] Step 63900 out of 120000 | Loss --> 2.235 | Grad_l2 --> 0.379 | Weights_l2 --> 41128.440 | Lr --> 0.004 | Seconds_per_step --> 2.051 | [2024-07-31 02:02:05,968][Main][INFO] - [train] Step 64000 out of 120000 | Loss --> 2.228 | Grad_l2 --> 0.370 | Weights_l2 --> 41154.575 | Lr --> 0.004 | Seconds_per_step --> 2.004 | [2024-07-31 02:05:28,085][Main][INFO] - [train] Step 64100 out of 120000 | Loss --> 2.204 | Grad_l2 --> 0.374 | Weights_l2 --> 41180.172 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-31 02:08:52,479][Main][INFO] - [train] Step 64200 out of 120000 | Loss --> 2.188 | Grad_l2 --> 0.374 | Weights_l2 --> 41206.021 | Lr --> 0.004 | Seconds_per_step --> 2.044 | [2024-07-31 02:12:09,951][Main][INFO] - [train] Step 64300 out of 120000 | Loss --> 2.191 | Grad_l2 --> 0.373 | Weights_l2 --> 41231.156 | Lr --> 0.004 | Seconds_per_step --> 1.975 | [2024-07-31 02:15:34,187][Main][INFO] - [train] Step 64400 out of 120000 | Loss --> 2.178 | Grad_l2 --> 0.369 | Weights_l2 --> 41256.629 | Lr --> 0.004 | Seconds_per_step --> 2.042 | [2024-07-31 02:18:56,279][Main][INFO] - [train] Step 64500 out of 120000 | Loss --> 2.175 | Grad_l2 --> 0.378 | Weights_l2 --> 41281.628 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-31 02:22:16,512][Main][INFO] - [train] Step 64600 out of 120000 | Loss --> 2.162 | Grad_l2 --> 0.370 | Weights_l2 --> 41306.742 | Lr --> 0.004 | Seconds_per_step --> 2.002 | [2024-07-31 02:25:41,078][Main][INFO] - [train] Step 64700 out of 120000 | Loss --> 2.158 | Grad_l2 --> 0.365 | Weights_l2 --> 41331.853 | Lr --> 0.004 | Seconds_per_step --> 2.046 | [2024-07-31 02:29:02,436][Main][INFO] - [train] Step 64800 out of 120000 | Loss --> 2.158 | Grad_l2 --> 0.365 | Weights_l2 --> 41356.704 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-31 02:32:21,435][Main][INFO] - [train] Step 64900 out of 120000 | Loss --> 2.157 | Grad_l2 --> 0.372 | Weights_l2 --> 41381.690 | Lr --> 0.004 | Seconds_per_step --> 1.990 | [2024-07-31 02:35:46,177][Main][INFO] - [train] Step 65000 out of 120000 | Loss --> 2.148 | Grad_l2 --> 0.370 | Weights_l2 --> 41406.728 | Lr --> 0.004 | Seconds_per_step --> 2.047 | [2024-07-31 02:46:47,532][Main][INFO] - [eval] Step 65000 out of 120000 | Loss --> 2.218 | Accuracy --> 0.602 | Time --> 661.352 | [2024-07-31 02:50:10,406][Main][INFO] - [train] Step 65100 out of 120000 | Loss --> 2.147 | Grad_l2 --> 0.366 | Weights_l2 --> 41431.937 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-31 02:53:30,655][Main][INFO] - [train] Step 65200 out of 120000 | Loss --> 2.134 | Grad_l2 --> 0.368 | Weights_l2 --> 41457.062 | Lr --> 0.004 | Seconds_per_step --> 2.002 | [2024-07-31 02:56:54,428][Main][INFO] - [train] Step 65300 out of 120000 | Loss --> 2.125 | Grad_l2 --> 0.374 | Weights_l2 --> 41481.540 | Lr --> 0.004 | Seconds_per_step --> 2.038 | [2024-07-31 03:00:15,752][Main][INFO] - [train] Step 65400 out of 120000 | Loss --> 2.133 | Grad_l2 --> 0.364 | Weights_l2 --> 41506.573 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-31 03:03:35,040][Main][INFO] - [train] Step 65500 out of 120000 | Loss --> 2.120 | Grad_l2 --> 0.371 | Weights_l2 --> 41530.900 | Lr --> 0.004 | Seconds_per_step --> 1.993 | [2024-07-31 03:06:58,722][Main][INFO] - [train] Step 65600 out of 120000 | Loss --> 2.129 | Grad_l2 --> 0.363 | Weights_l2 --> 41555.614 | Lr --> 0.004 | Seconds_per_step --> 2.037 | [2024-07-31 03:10:20,142][Main][INFO] - [train] Step 65700 out of 120000 | Loss --> 2.119 | Grad_l2 --> 0.374 | Weights_l2 --> 41580.307 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-31 03:13:41,992][Main][INFO] - [train] Step 65800 out of 120000 | Loss --> 2.118 | Grad_l2 --> 0.371 | Weights_l2 --> 41605.001 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-31 03:17:06,070][Main][INFO] - [train] Step 65900 out of 120000 | Loss --> 2.116 | Grad_l2 --> 0.369 | Weights_l2 --> 41629.736 | Lr --> 0.004 | Seconds_per_step --> 2.041 | [2024-07-31 03:20:26,669][Main][INFO] - [train] Step 66000 out of 120000 | Loss --> 2.105 | Grad_l2 --> 0.360 | Weights_l2 --> 41654.069 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-31 03:23:48,961][Main][INFO] - [train] Step 66100 out of 120000 | Loss --> 2.120 | Grad_l2 --> 0.387 | Weights_l2 --> 41678.130 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 03:27:10,488][Main][INFO] - [train] Step 66200 out of 120000 | Loss --> 2.112 | Grad_l2 --> 0.362 | Weights_l2 --> 41702.107 | Lr --> 0.004 | Seconds_per_step --> 2.015 | [2024-07-31 03:30:34,237][Main][INFO] - [train] Step 66300 out of 120000 | Loss --> 2.119 | Grad_l2 --> 0.365 | Weights_l2 --> 41726.715 | Lr --> 0.004 | Seconds_per_step --> 2.037 | [2024-07-31 03:33:57,766][Main][INFO] - [train] Step 66400 out of 120000 | Loss --> 2.124 | Grad_l2 --> 0.361 | Weights_l2 --> 41751.328 | Lr --> 0.004 | Seconds_per_step --> 2.035 | [2024-07-31 03:37:19,491][Main][INFO] - [train] Step 66500 out of 120000 | Loss --> 2.118 | Grad_l2 --> 0.365 | Weights_l2 --> 41775.920 | Lr --> 0.004 | Seconds_per_step --> 2.017 | [2024-07-31 03:40:40,334][Main][INFO] - [train] Step 66600 out of 120000 | Loss --> 2.131 | Grad_l2 --> 0.367 | Weights_l2 --> 41800.443 | Lr --> 0.004 | Seconds_per_step --> 2.008 | [2024-07-31 03:44:02,781][Main][INFO] - [train] Step 66700 out of 120000 | Loss --> 2.126 | Grad_l2 --> 0.368 | Weights_l2 --> 41824.599 | Lr --> 0.004 | Seconds_per_step --> 2.024 | [2024-07-31 03:47:21,991][Main][INFO] - [train] Step 66800 out of 120000 | Loss --> 2.127 | Grad_l2 --> 0.368 | Weights_l2 --> 41848.509 | Lr --> 0.004 | Seconds_per_step --> 1.992 | [2024-07-31 03:50:42,253][Main][INFO] - [train] Step 66900 out of 120000 | Loss --> 2.120 | Grad_l2 --> 0.366 | Weights_l2 --> 41873.172 | Lr --> 0.004 | Seconds_per_step --> 2.003 | [2024-07-31 03:54:05,654][Main][INFO] - [train] Step 67000 out of 120000 | Loss --> 2.129 | Grad_l2 --> 0.373 | Weights_l2 --> 41897.339 | Lr --> 0.004 | Seconds_per_step --> 2.034 | [2024-07-31 03:57:25,751][Main][INFO] - [train] Step 67100 out of 120000 | Loss --> 2.142 | Grad_l2 --> 0.366 | Weights_l2 --> 41921.541 | Lr --> 0.004 | Seconds_per_step --> 2.001 | [2024-07-31 04:00:46,761][Main][INFO] - [train] Step 67200 out of 120000 | Loss --> 2.135 | Grad_l2 --> 0.364 | Weights_l2 --> 41945.850 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-31 04:04:11,041][Main][INFO] - [train] Step 67300 out of 120000 | Loss --> 2.128 | Grad_l2 --> 0.371 | Weights_l2 --> 41970.190 | Lr --> 0.004 | Seconds_per_step --> 2.043 | [2024-07-31 04:07:29,216][Main][INFO] - [train] Step 67400 out of 120000 | Loss --> 2.139 | Grad_l2 --> 0.368 | Weights_l2 --> 41994.206 | Lr --> 0.004 | Seconds_per_step --> 1.982 | [2024-07-31 04:10:50,401][Main][INFO] - [train] Step 67500 out of 120000 | Loss --> 2.135 | Grad_l2 --> 0.373 | Weights_l2 --> 42018.574 | Lr --> 0.004 | Seconds_per_step --> 2.012 | [2024-07-31 04:14:12,536][Main][INFO] - [train] Step 67600 out of 120000 | Loss --> 2.130 | Grad_l2 --> 0.366 | Weights_l2 --> 42042.846 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-31 04:17:33,186][Main][INFO] - [train] Step 67700 out of 120000 | Loss --> 2.119 | Grad_l2 --> 0.369 | Weights_l2 --> 42066.723 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-31 04:20:55,171][Main][INFO] - [train] Step 67800 out of 120000 | Loss --> 2.126 | Grad_l2 --> 0.370 | Weights_l2 --> 42090.830 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-31 04:24:16,639][Main][INFO] - [train] Step 67900 out of 120000 | Loss --> 2.128 | Grad_l2 --> 0.369 | Weights_l2 --> 42114.582 | Lr --> 0.004 | Seconds_per_step --> 2.015 | [2024-07-31 04:27:37,554][Main][INFO] - [train] Step 68000 out of 120000 | Loss --> 2.131 | Grad_l2 --> 0.363 | Weights_l2 --> 42138.265 | Lr --> 0.004 | Seconds_per_step --> 2.009 | [2024-07-31 04:31:00,654][Main][INFO] - [train] Step 68100 out of 120000 | Loss --> 2.128 | Grad_l2 --> 0.366 | Weights_l2 --> 42162.213 | Lr --> 0.004 | Seconds_per_step --> 2.031 | [2024-07-31 04:34:22,696][Main][INFO] - [train] Step 68200 out of 120000 | Loss --> 2.141 | Grad_l2 --> 0.370 | Weights_l2 --> 42186.078 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-31 04:37:44,053][Main][INFO] - [train] Step 68300 out of 120000 | Loss --> 2.131 | Grad_l2 --> 0.364 | Weights_l2 --> 42210.099 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-31 04:41:06,538][Main][INFO] - [train] Step 68400 out of 120000 | Loss --> 2.128 | Grad_l2 --> 0.366 | Weights_l2 --> 42233.897 | Lr --> 0.004 | Seconds_per_step --> 2.025 | [2024-07-31 04:44:28,176][Main][INFO] - [train] Step 68500 out of 120000 | Loss --> 2.141 | Grad_l2 --> 0.377 | Weights_l2 --> 42257.967 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-31 04:47:52,268][Main][INFO] - [train] Step 68600 out of 120000 | Loss --> 2.130 | Grad_l2 --> 0.370 | Weights_l2 --> 42281.596 | Lr --> 0.004 | Seconds_per_step --> 2.041 | [2024-07-31 04:51:13,820][Main][INFO] - [train] Step 68700 out of 120000 | Loss --> 2.151 | Grad_l2 --> 0.368 | Weights_l2 --> 42305.836 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-31 04:54:34,784][Main][INFO] - [train] Step 68800 out of 120000 | Loss --> 2.133 | Grad_l2 --> 0.370 | Weights_l2 --> 42329.944 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-31 04:57:55,991][Main][INFO] - [train] Step 68900 out of 120000 | Loss --> 2.139 | Grad_l2 --> 0.367 | Weights_l2 --> 42353.789 | Lr --> 0.004 | Seconds_per_step --> 2.012 | [2024-07-31 05:01:17,546][Main][INFO] - [train] Step 69000 out of 120000 | Loss --> 2.131 | Grad_l2 --> 0.365 | Weights_l2 --> 42377.901 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-31 05:04:38,109][Main][INFO] - [train] Step 69100 out of 120000 | Loss --> 2.126 | Grad_l2 --> 0.377 | Weights_l2 --> 42401.969 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-31 05:08:00,882][Main][INFO] - [train] Step 69200 out of 120000 | Loss --> 2.135 | Grad_l2 --> 0.373 | Weights_l2 --> 42426.322 | Lr --> 0.004 | Seconds_per_step --> 2.028 | [2024-07-31 05:11:23,449][Main][INFO] - [train] Step 69300 out of 120000 | Loss --> 2.124 | Grad_l2 --> 0.366 | Weights_l2 --> 42450.111 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-31 05:14:46,310][Main][INFO] - [train] Step 69400 out of 120000 | Loss --> 2.136 | Grad_l2 --> 0.364 | Weights_l2 --> 42474.310 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-31 05:18:08,978][Main][INFO] - [train] Step 69500 out of 120000 | Loss --> 2.141 | Grad_l2 --> 0.369 | Weights_l2 --> 42498.926 | Lr --> 0.004 | Seconds_per_step --> 2.027 | [2024-07-31 05:21:29,439][Main][INFO] - [train] Step 69600 out of 120000 | Loss --> 2.137 | Grad_l2 --> 0.366 | Weights_l2 --> 42523.250 | Lr --> 0.004 | Seconds_per_step --> 2.005 | [2024-07-31 05:24:52,936][Main][INFO] - [train] Step 69700 out of 120000 | Loss --> 2.142 | Grad_l2 --> 0.386 | Weights_l2 --> 42547.252 | Lr --> 0.004 | Seconds_per_step --> 2.035 | [2024-07-31 05:28:15,353][Main][INFO] - [train] Step 69800 out of 120000 | Loss --> 2.148 | Grad_l2 --> 0.370 | Weights_l2 --> 42571.408 | Lr --> 0.004 | Seconds_per_step --> 2.024 | [2024-07-31 05:31:34,902][Main][INFO] - [train] Step 69900 out of 120000 | Loss --> 2.172 | Grad_l2 --> 0.368 | Weights_l2 --> 42595.896 | Lr --> 0.004 | Seconds_per_step --> 1.995 | [2024-07-31 05:34:57,667][Main][INFO] - [train] Step 70000 out of 120000 | Loss --> 2.162 | Grad_l2 --> 0.366 | Weights_l2 --> 42620.520 | Lr --> 0.004 | Seconds_per_step --> 2.028 | [2024-07-31 05:46:01,528][Main][INFO] - [eval] Step 70000 out of 120000 | Loss --> 2.195 | Accuracy --> 0.604 | Time --> 663.859 | [2024-07-31 05:46:01,533][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-70000 [2024-07-31 05:46:01,538][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-31 05:46:04,684][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-70000/model.safetensors [2024-07-31 05:46:04,733][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-70000/optimizer.bin [2024-07-31 05:46:04,735][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-70000/scheduler.bin [2024-07-31 05:46:04,735][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-70000/sampler.bin [2024-07-31 05:46:04,735][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-70000/sampler_1.bin [2024-07-31 05:46:04,736][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-70000/random_states_0.pkl [2024-07-31 05:49:27,056][Main][INFO] - [train] Step 70100 out of 120000 | Loss --> 2.164 | Grad_l2 --> 0.373 | Weights_l2 --> 42645.632 | Lr --> 0.004 | Seconds_per_step --> 2.055 | [2024-07-31 05:52:50,160][Main][INFO] - [train] Step 70200 out of 120000 | Loss --> 2.173 | Grad_l2 --> 0.368 | Weights_l2 --> 42669.851 | Lr --> 0.004 | Seconds_per_step --> 2.031 | [2024-07-31 05:56:14,354][Main][INFO] - [train] Step 70300 out of 120000 | Loss --> 2.130 | Grad_l2 --> 0.362 | Weights_l2 --> 42694.186 | Lr --> 0.004 | Seconds_per_step --> 2.042 | [2024-07-31 05:59:34,322][Main][INFO] - [train] Step 70400 out of 120000 | Loss --> 2.109 | Grad_l2 --> 0.363 | Weights_l2 --> 42718.424 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-31 06:02:56,005][Main][INFO] - [train] Step 70500 out of 120000 | Loss --> 2.096 | Grad_l2 --> 0.368 | Weights_l2 --> 42742.584 | Lr --> 0.004 | Seconds_per_step --> 2.017 | [2024-07-31 06:06:18,305][Main][INFO] - [train] Step 70600 out of 120000 | Loss --> 2.100 | Grad_l2 --> 0.365 | Weights_l2 --> 42767.206 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 06:09:39,778][Main][INFO] - [train] Step 70700 out of 120000 | Loss --> 2.109 | Grad_l2 --> 0.367 | Weights_l2 --> 42791.236 | Lr --> 0.004 | Seconds_per_step --> 2.015 | [2024-07-31 06:13:04,685][Main][INFO] - [train] Step 70800 out of 120000 | Loss --> 2.099 | Grad_l2 --> 0.371 | Weights_l2 --> 42814.683 | Lr --> 0.004 | Seconds_per_step --> 2.049 | [2024-07-31 06:16:25,557][Main][INFO] - [train] Step 70900 out of 120000 | Loss --> 2.100 | Grad_l2 --> 0.358 | Weights_l2 --> 42838.327 | Lr --> 0.004 | Seconds_per_step --> 2.009 | [2024-07-31 06:19:46,751][Main][INFO] - [train] Step 71000 out of 120000 | Loss --> 2.110 | Grad_l2 --> 0.363 | Weights_l2 --> 42861.775 | Lr --> 0.004 | Seconds_per_step --> 2.012 | [2024-07-31 06:23:12,191][Main][INFO] - [train] Step 71100 out of 120000 | Loss --> 2.112 | Grad_l2 --> 0.366 | Weights_l2 --> 42885.825 | Lr --> 0.004 | Seconds_per_step --> 2.054 | [2024-07-31 06:26:32,637][Main][INFO] - [train] Step 71200 out of 120000 | Loss --> 2.113 | Grad_l2 --> 0.366 | Weights_l2 --> 42910.100 | Lr --> 0.004 | Seconds_per_step --> 2.004 | [2024-07-31 06:29:52,033][Main][INFO] - [train] Step 71300 out of 120000 | Loss --> 2.127 | Grad_l2 --> 0.372 | Weights_l2 --> 42934.046 | Lr --> 0.004 | Seconds_per_step --> 1.994 | [2024-07-31 06:33:17,238][Main][INFO] - [train] Step 71400 out of 120000 | Loss --> 2.124 | Grad_l2 --> 0.370 | Weights_l2 --> 42958.164 | Lr --> 0.004 | Seconds_per_step --> 2.052 | [2024-07-31 06:36:38,764][Main][INFO] - [train] Step 71500 out of 120000 | Loss --> 2.124 | Grad_l2 --> 0.375 | Weights_l2 --> 42982.016 | Lr --> 0.004 | Seconds_per_step --> 2.015 | [2024-07-31 06:39:58,721][Main][INFO] - [train] Step 71600 out of 120000 | Loss --> 2.121 | Grad_l2 --> 0.365 | Weights_l2 --> 43006.163 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-31 06:43:23,557][Main][INFO] - [train] Step 71700 out of 120000 | Loss --> 2.123 | Grad_l2 --> 0.364 | Weights_l2 --> 43029.933 | Lr --> 0.004 | Seconds_per_step --> 2.048 | [2024-07-31 06:46:43,877][Main][INFO] - [train] Step 71800 out of 120000 | Loss --> 2.108 | Grad_l2 --> 0.369 | Weights_l2 --> 43053.919 | Lr --> 0.004 | Seconds_per_step --> 2.003 | [2024-07-31 06:50:03,922][Main][INFO] - [train] Step 71900 out of 120000 | Loss --> 2.117 | Grad_l2 --> 0.364 | Weights_l2 --> 43077.380 | Lr --> 0.004 | Seconds_per_step --> 2.000 | [2024-07-31 06:53:29,983][Main][INFO] - [train] Step 72000 out of 120000 | Loss --> 2.090 | Grad_l2 --> 0.374 | Weights_l2 --> 43101.012 | Lr --> 0.004 | Seconds_per_step --> 2.061 | [2024-07-31 06:56:49,892][Main][INFO] - [train] Step 72100 out of 120000 | Loss --> 2.087 | Grad_l2 --> 0.367 | Weights_l2 --> 43124.752 | Lr --> 0.004 | Seconds_per_step --> 1.999 | [2024-07-31 07:00:09,497][Main][INFO] - [train] Step 72200 out of 120000 | Loss --> 2.088 | Grad_l2 --> 0.361 | Weights_l2 --> 43148.323 | Lr --> 0.004 | Seconds_per_step --> 1.996 | [2024-07-31 07:03:34,410][Main][INFO] - [train] Step 72300 out of 120000 | Loss --> 2.075 | Grad_l2 --> 0.364 | Weights_l2 --> 43171.759 | Lr --> 0.004 | Seconds_per_step --> 2.049 | [2024-07-31 07:06:54,747][Main][INFO] - [train] Step 72400 out of 120000 | Loss --> 2.082 | Grad_l2 --> 0.361 | Weights_l2 --> 43194.952 | Lr --> 0.004 | Seconds_per_step --> 2.003 | [2024-07-31 07:10:15,863][Main][INFO] - [train] Step 72500 out of 120000 | Loss --> 2.070 | Grad_l2 --> 0.361 | Weights_l2 --> 43218.248 | Lr --> 0.004 | Seconds_per_step --> 2.011 | [2024-07-31 07:13:39,364][Main][INFO] - [train] Step 72600 out of 120000 | Loss --> 2.074 | Grad_l2 --> 0.369 | Weights_l2 --> 43241.281 | Lr --> 0.004 | Seconds_per_step --> 2.035 | [2024-07-31 07:17:00,634][Main][INFO] - [train] Step 72700 out of 120000 | Loss --> 2.063 | Grad_l2 --> 0.371 | Weights_l2 --> 43264.370 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-31 07:20:21,062][Main][INFO] - [train] Step 72800 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.363 | Weights_l2 --> 43287.746 | Lr --> 0.004 | Seconds_per_step --> 2.004 | [2024-07-31 07:23:45,238][Main][INFO] - [train] Step 72900 out of 120000 | Loss --> 2.060 | Grad_l2 --> 0.367 | Weights_l2 --> 43311.161 | Lr --> 0.004 | Seconds_per_step --> 2.042 | [2024-07-31 07:27:06,222][Main][INFO] - [train] Step 73000 out of 120000 | Loss --> 2.023 | Grad_l2 --> 0.358 | Weights_l2 --> 43335.219 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-31 07:30:28,355][Main][INFO] - [train] Step 73100 out of 120000 | Loss --> 2.010 | Grad_l2 --> 0.365 | Weights_l2 --> 43359.404 | Lr --> 0.004 | Seconds_per_step --> 2.021 | [2024-07-31 07:33:52,350][Main][INFO] - [train] Step 73200 out of 120000 | Loss --> 2.017 | Grad_l2 --> 0.365 | Weights_l2 --> 43383.254 | Lr --> 0.004 | Seconds_per_step --> 2.040 | [2024-07-31 07:37:14,154][Main][INFO] - [train] Step 73300 out of 120000 | Loss --> 2.016 | Grad_l2 --> 0.368 | Weights_l2 --> 43407.606 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-31 07:40:35,554][Main][INFO] - [train] Step 73400 out of 120000 | Loss --> 2.017 | Grad_l2 --> 0.365 | Weights_l2 --> 43431.235 | Lr --> 0.004 | Seconds_per_step --> 2.014 | [2024-07-31 07:43:57,891][Main][INFO] - [train] Step 73500 out of 120000 | Loss --> 2.010 | Grad_l2 --> 0.359 | Weights_l2 --> 43455.402 | Lr --> 0.004 | Seconds_per_step --> 2.023 | [2024-07-31 07:47:23,040][Main][INFO] - [train] Step 73600 out of 120000 | Loss --> 2.033 | Grad_l2 --> 0.368 | Weights_l2 --> 43479.020 | Lr --> 0.004 | Seconds_per_step --> 2.051 | [2024-07-31 07:50:44,360][Main][INFO] - [train] Step 73700 out of 120000 | Loss --> 2.032 | Grad_l2 --> 0.364 | Weights_l2 --> 43502.747 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-31 07:54:04,824][Main][INFO] - [train] Step 73800 out of 120000 | Loss --> 2.035 | Grad_l2 --> 0.364 | Weights_l2 --> 43526.129 | Lr --> 0.004 | Seconds_per_step --> 2.005 | [2024-07-31 07:57:25,393][Main][INFO] - [train] Step 73900 out of 120000 | Loss --> 2.047 | Grad_l2 --> 0.372 | Weights_l2 --> 43549.803 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-31 08:00:48,695][Main][INFO] - [train] Step 74000 out of 120000 | Loss --> 2.051 | Grad_l2 --> 0.363 | Weights_l2 --> 43573.713 | Lr --> 0.004 | Seconds_per_step --> 2.033 | [2024-07-31 08:04:11,069][Main][INFO] - [train] Step 74100 out of 120000 | Loss --> 2.057 | Grad_l2 --> 0.367 | Weights_l2 --> 43597.343 | Lr --> 0.004 | Seconds_per_step --> 2.024 | [2024-07-31 08:07:32,971][Main][INFO] - [train] Step 74200 out of 120000 | Loss --> 2.061 | Grad_l2 --> 0.367 | Weights_l2 --> 43620.945 | Lr --> 0.004 | Seconds_per_step --> 2.019 | [2024-07-31 08:10:58,461][Main][INFO] - [train] Step 74300 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.367 | Weights_l2 --> 43644.167 | Lr --> 0.004 | Seconds_per_step --> 2.055 | [2024-07-31 08:14:21,019][Main][INFO] - [train] Step 74400 out of 120000 | Loss --> 2.071 | Grad_l2 --> 0.362 | Weights_l2 --> 43668.408 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-31 08:17:42,596][Main][INFO] - [train] Step 74500 out of 120000 | Loss --> 2.076 | Grad_l2 --> 0.363 | Weights_l2 --> 43692.121 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-31 08:21:07,057][Main][INFO] - [train] Step 74600 out of 120000 | Loss --> 2.081 | Grad_l2 --> 0.365 | Weights_l2 --> 43716.134 | Lr --> 0.004 | Seconds_per_step --> 2.045 | [2024-07-31 08:24:28,744][Main][INFO] - [train] Step 74700 out of 120000 | Loss --> 2.062 | Grad_l2 --> 0.358 | Weights_l2 --> 43740.211 | Lr --> 0.004 | Seconds_per_step --> 2.017 | [2024-07-31 08:27:52,304][Main][INFO] - [train] Step 74800 out of 120000 | Loss --> 2.070 | Grad_l2 --> 0.365 | Weights_l2 --> 43764.070 | Lr --> 0.004 | Seconds_per_step --> 2.036 | [2024-07-31 08:31:13,921][Main][INFO] - [train] Step 74900 out of 120000 | Loss --> 2.063 | Grad_l2 --> 0.367 | Weights_l2 --> 43787.659 | Lr --> 0.004 | Seconds_per_step --> 2.016 | [2024-07-31 08:34:34,137][Main][INFO] - [train] Step 75000 out of 120000 | Loss --> 2.066 | Grad_l2 --> 0.365 | Weights_l2 --> 43811.480 | Lr --> 0.004 | Seconds_per_step --> 2.002 | [2024-07-31 08:45:45,205][Main][INFO] - [eval] Step 75000 out of 120000 | Loss --> 2.178 | Accuracy --> 0.607 | Time --> 671.065 | [2024-07-31 08:49:10,341][Main][INFO] - [train] Step 75100 out of 120000 | Loss --> 2.067 | Grad_l2 --> 0.366 | Weights_l2 --> 43835.563 | Lr --> 0.004 | Seconds_per_step --> 2.051 | [2024-07-31 08:52:31,929][Main][INFO] - [train] Step 75200 out of 120000 | Loss --> 2.065 | Grad_l2 --> 0.367 | Weights_l2 --> 43859.178 | Lr --> 0.004 | Seconds_per_step --> 2.015 | [2024-07-31 08:55:51,770][Main][INFO] - [train] Step 75300 out of 120000 | Loss --> 2.073 | Grad_l2 --> 0.369 | Weights_l2 --> 43882.947 | Lr --> 0.004 | Seconds_per_step --> 1.998 | [2024-07-31 08:59:17,167][Main][INFO] - [train] Step 75400 out of 120000 | Loss --> 2.073 | Grad_l2 --> 0.365 | Weights_l2 --> 43906.490 | Lr --> 0.004 | Seconds_per_step --> 2.054 | [2024-07-31 09:02:37,887][Main][INFO] - [train] Step 75500 out of 120000 | Loss --> 2.089 | Grad_l2 --> 0.366 | Weights_l2 --> 43929.910 | Lr --> 0.004 | Seconds_per_step --> 2.007 | [2024-07-31 09:05:58,073][Main][INFO] - [train] Step 75600 out of 120000 | Loss --> 2.080 | Grad_l2 --> 0.363 | Weights_l2 --> 43954.191 | Lr --> 0.004 | Seconds_per_step --> 2.002 | [2024-07-31 09:09:23,738][Main][INFO] - [train] Step 75700 out of 120000 | Loss --> 2.091 | Grad_l2 --> 0.362 | Weights_l2 --> 43977.433 | Lr --> 0.004 | Seconds_per_step --> 2.057 | [2024-07-31 09:12:45,534][Main][INFO] - [train] Step 75800 out of 120000 | Loss --> 2.111 | Grad_l2 --> 0.472 | Weights_l2 --> 44000.343 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-31 09:16:08,721][Main][INFO] - [train] Step 75900 out of 120000 | Loss --> 2.079 | Grad_l2 --> 0.364 | Weights_l2 --> 44023.692 | Lr --> 0.004 | Seconds_per_step --> 2.032 | [2024-07-31 09:19:32,552][Main][INFO] - [train] Step 76000 out of 120000 | Loss --> 2.096 | Grad_l2 --> 0.362 | Weights_l2 --> 44047.462 | Lr --> 0.004 | Seconds_per_step --> 2.038 | [2024-07-31 09:22:53,295][Main][INFO] - [train] Step 76100 out of 120000 | Loss --> 2.087 | Grad_l2 --> 0.370 | Weights_l2 --> 44070.737 | Lr --> 0.004 | Seconds_per_step --> 2.007 | [2024-07-31 09:26:16,294][Main][INFO] - [train] Step 76200 out of 120000 | Loss --> 2.090 | Grad_l2 --> 0.366 | Weights_l2 --> 44094.108 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-31 09:29:38,754][Main][INFO] - [train] Step 76300 out of 120000 | Loss --> 2.109 | Grad_l2 --> 0.371 | Weights_l2 --> 44117.512 | Lr --> 0.004 | Seconds_per_step --> 2.025 | [2024-07-31 09:32:57,977][Main][INFO] - [train] Step 76400 out of 120000 | Loss --> 2.099 | Grad_l2 --> 0.366 | Weights_l2 --> 44141.179 | Lr --> 0.004 | Seconds_per_step --> 1.992 | [2024-07-31 09:36:20,823][Main][INFO] - [train] Step 76500 out of 120000 | Loss --> 2.096 | Grad_l2 --> 0.366 | Weights_l2 --> 44163.983 | Lr --> 0.004 | Seconds_per_step --> 2.028 | [2024-07-31 09:39:43,852][Main][INFO] - [train] Step 76600 out of 120000 | Loss --> 2.095 | Grad_l2 --> 0.367 | Weights_l2 --> 44187.302 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-31 09:43:03,490][Main][INFO] - [train] Step 76700 out of 120000 | Loss --> 2.094 | Grad_l2 --> 0.367 | Weights_l2 --> 44210.536 | Lr --> 0.004 | Seconds_per_step --> 1.996 | [2024-07-31 09:46:25,468][Main][INFO] - [train] Step 76800 out of 120000 | Loss --> 2.103 | Grad_l2 --> 0.375 | Weights_l2 --> 44233.233 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-31 09:49:50,393][Main][INFO] - [train] Step 76900 out of 120000 | Loss --> 2.109 | Grad_l2 --> 0.368 | Weights_l2 --> 44255.891 | Lr --> 0.004 | Seconds_per_step --> 2.049 | [2024-07-31 09:53:09,542][Main][INFO] - [train] Step 77000 out of 120000 | Loss --> 2.118 | Grad_l2 --> 0.361 | Weights_l2 --> 44278.896 | Lr --> 0.004 | Seconds_per_step --> 1.991 | [2024-07-31 09:56:33,237][Main][INFO] - [train] Step 77100 out of 120000 | Loss --> 2.113 | Grad_l2 --> 0.365 | Weights_l2 --> 44301.481 | Lr --> 0.004 | Seconds_per_step --> 2.037 | [2024-07-31 09:59:54,494][Main][INFO] - [train] Step 77200 out of 120000 | Loss --> 2.121 | Grad_l2 --> 0.368 | Weights_l2 --> 44324.305 | Lr --> 0.004 | Seconds_per_step --> 2.013 | [2024-07-31 10:03:13,273][Main][INFO] - [train] Step 77300 out of 120000 | Loss --> 2.110 | Grad_l2 --> 0.368 | Weights_l2 --> 44346.981 | Lr --> 0.004 | Seconds_per_step --> 1.988 | [2024-07-31 10:06:36,839][Main][INFO] - [train] Step 77400 out of 120000 | Loss --> 2.101 | Grad_l2 --> 0.358 | Weights_l2 --> 44369.897 | Lr --> 0.004 | Seconds_per_step --> 2.036 | [2024-07-31 10:09:57,493][Main][INFO] - [train] Step 77500 out of 120000 | Loss --> 2.096 | Grad_l2 --> 0.364 | Weights_l2 --> 44392.230 | Lr --> 0.004 | Seconds_per_step --> 2.007 | [2024-07-31 10:13:17,119][Main][INFO] - [train] Step 77600 out of 120000 | Loss --> 2.092 | Grad_l2 --> 0.364 | Weights_l2 --> 44414.144 | Lr --> 0.004 | Seconds_per_step --> 1.996 | [2024-07-31 10:16:39,105][Main][INFO] - [train] Step 77700 out of 120000 | Loss --> 2.081 | Grad_l2 --> 0.364 | Weights_l2 --> 44435.886 | Lr --> 0.004 | Seconds_per_step --> 2.020 | [2024-07-31 10:20:02,065][Main][INFO] - [train] Step 77800 out of 120000 | Loss --> 2.071 | Grad_l2 --> 0.367 | Weights_l2 --> 44457.736 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-31 10:23:20,340][Main][INFO] - [train] Step 77900 out of 120000 | Loss --> 2.070 | Grad_l2 --> 0.363 | Weights_l2 --> 44479.019 | Lr --> 0.004 | Seconds_per_step --> 1.983 | [2024-07-31 10:26:43,553][Main][INFO] - [train] Step 78000 out of 120000 | Loss --> 2.075 | Grad_l2 --> 0.364 | Weights_l2 --> 44500.704 | Lr --> 0.004 | Seconds_per_step --> 2.032 | [2024-07-31 10:30:04,555][Main][INFO] - [train] Step 78100 out of 120000 | Loss --> 2.067 | Grad_l2 --> 0.362 | Weights_l2 --> 44522.252 | Lr --> 0.004 | Seconds_per_step --> 2.010 | [2024-07-31 10:33:23,733][Main][INFO] - [train] Step 78200 out of 120000 | Loss --> 2.056 | Grad_l2 --> 0.363 | Weights_l2 --> 44543.507 | Lr --> 0.004 | Seconds_per_step --> 1.992 | [2024-07-31 10:36:49,337][Main][INFO] - [train] Step 78300 out of 120000 | Loss --> 2.059 | Grad_l2 --> 0.361 | Weights_l2 --> 44565.161 | Lr --> 0.004 | Seconds_per_step --> 2.056 | [2024-07-31 10:40:09,145][Main][INFO] - [train] Step 78400 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.367 | Weights_l2 --> 44587.046 | Lr --> 0.004 | Seconds_per_step --> 1.998 | [2024-07-31 10:43:28,429][Main][INFO] - [train] Step 78500 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.387 | Weights_l2 --> 44608.563 | Lr --> 0.004 | Seconds_per_step --> 1.993 | [2024-07-31 10:46:52,934][Main][INFO] - [train] Step 78600 out of 120000 | Loss --> 2.054 | Grad_l2 --> 0.360 | Weights_l2 --> 44630.103 | Lr --> 0.004 | Seconds_per_step --> 2.045 | [2024-07-31 10:50:13,406][Main][INFO] - [train] Step 78700 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.367 | Weights_l2 --> 44651.063 | Lr --> 0.004 | Seconds_per_step --> 2.005 | [2024-07-31 10:53:32,120][Main][INFO] - [train] Step 78800 out of 120000 | Loss --> 2.039 | Grad_l2 --> 0.365 | Weights_l2 --> 44672.482 | Lr --> 0.004 | Seconds_per_step --> 1.987 | [2024-07-31 10:56:55,892][Main][INFO] - [train] Step 78900 out of 120000 | Loss --> 2.054 | Grad_l2 --> 0.362 | Weights_l2 --> 44694.179 | Lr --> 0.004 | Seconds_per_step --> 2.038 | [2024-07-31 11:00:16,455][Main][INFO] - [train] Step 79000 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.365 | Weights_l2 --> 44715.738 | Lr --> 0.004 | Seconds_per_step --> 2.006 | [2024-07-31 11:03:36,575][Main][INFO] - [train] Step 79100 out of 120000 | Loss --> 2.042 | Grad_l2 --> 0.367 | Weights_l2 --> 44736.776 | Lr --> 0.004 | Seconds_per_step --> 2.001 | [2024-07-31 11:07:02,037][Main][INFO] - [train] Step 79200 out of 120000 | Loss --> 2.036 | Grad_l2 --> 0.378 | Weights_l2 --> 44758.227 | Lr --> 0.004 | Seconds_per_step --> 2.055 | [2024-07-31 11:10:20,587][Main][INFO] - [train] Step 79300 out of 120000 | Loss --> 2.020 | Grad_l2 --> 0.359 | Weights_l2 --> 44779.408 | Lr --> 0.004 | Seconds_per_step --> 1.985 | [2024-07-31 11:13:41,440][Main][INFO] - [train] Step 79400 out of 120000 | Loss --> 2.045 | Grad_l2 --> 0.363 | Weights_l2 --> 44800.742 | Lr --> 0.004 | Seconds_per_step --> 2.009 | [2024-07-31 11:17:04,454][Main][INFO] - [train] Step 79500 out of 120000 | Loss --> 2.037 | Grad_l2 --> 0.353 | Weights_l2 --> 44821.893 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-31 11:20:24,173][Main][INFO] - [train] Step 79600 out of 120000 | Loss --> 2.031 | Grad_l2 --> 0.356 | Weights_l2 --> 44842.713 | Lr --> 0.004 | Seconds_per_step --> 1.997 | [2024-07-31 11:23:45,968][Main][INFO] - [train] Step 79700 out of 120000 | Loss --> 2.031 | Grad_l2 --> 0.351 | Weights_l2 --> 44863.462 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-31 11:27:08,623][Main][INFO] - [train] Step 79800 out of 120000 | Loss --> 2.037 | Grad_l2 --> 0.356 | Weights_l2 --> 44884.753 | Lr --> 0.004 | Seconds_per_step --> 2.027 | [2024-07-31 11:30:27,510][Main][INFO] - [train] Step 79900 out of 120000 | Loss --> 2.036 | Grad_l2 --> 0.378 | Weights_l2 --> 44905.644 | Lr --> 0.004 | Seconds_per_step --> 1.989 | [2024-07-31 11:33:50,183][Main][INFO] - [train] Step 80000 out of 120000 | Loss --> 2.035 | Grad_l2 --> 0.367 | Weights_l2 --> 44926.759 | Lr --> 0.004 | Seconds_per_step --> 2.027 | [2024-07-31 11:44:54,422][Main][INFO] - [eval] Step 80000 out of 120000 | Loss --> 2.155 | Accuracy --> 0.609 | Time --> 664.236 | [2024-07-31 11:44:54,425][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-80000 [2024-07-31 11:44:54,429][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-31 11:44:57,655][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-80000/model.safetensors [2024-07-31 11:44:57,715][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-80000/optimizer.bin [2024-07-31 11:44:57,717][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-80000/scheduler.bin [2024-07-31 11:44:57,717][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-80000/sampler.bin [2024-07-31 11:44:57,717][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-80000/sampler_1.bin [2024-07-31 11:44:57,718][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-80000/random_states_0.pkl [2024-07-31 11:48:22,160][Main][INFO] - [train] Step 80100 out of 120000 | Loss --> 2.050 | Grad_l2 --> 0.362 | Weights_l2 --> 44947.778 | Lr --> 0.004 | Seconds_per_step --> 2.077 | [2024-07-31 11:51:42,551][Main][INFO] - [train] Step 80200 out of 120000 | Loss --> 2.042 | Grad_l2 --> 0.357 | Weights_l2 --> 44968.933 | Lr --> 0.004 | Seconds_per_step --> 2.004 | [2024-07-31 11:55:07,662][Main][INFO] - [train] Step 80300 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.364 | Weights_l2 --> 44990.384 | Lr --> 0.004 | Seconds_per_step --> 2.051 | [2024-07-31 11:58:30,271][Main][INFO] - [train] Step 80400 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.359 | Weights_l2 --> 45011.595 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-31 12:01:52,847][Main][INFO] - [train] Step 80500 out of 120000 | Loss --> 2.045 | Grad_l2 --> 0.365 | Weights_l2 --> 45032.942 | Lr --> 0.004 | Seconds_per_step --> 2.026 | [2024-07-31 12:05:15,284][Main][INFO] - [train] Step 80600 out of 120000 | Loss --> 2.042 | Grad_l2 --> 0.365 | Weights_l2 --> 45054.588 | Lr --> 0.004 | Seconds_per_step --> 2.024 | [2024-07-31 12:08:39,671][Main][INFO] - [train] Step 80700 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.369 | Weights_l2 --> 45075.842 | Lr --> 0.004 | Seconds_per_step --> 2.044 | [2024-07-31 12:12:01,459][Main][INFO] - [train] Step 80800 out of 120000 | Loss --> 2.054 | Grad_l2 --> 0.367 | Weights_l2 --> 45097.713 | Lr --> 0.004 | Seconds_per_step --> 2.018 | [2024-07-31 12:15:24,919][Main][INFO] - [train] Step 80900 out of 120000 | Loss --> 2.060 | Grad_l2 --> 0.372 | Weights_l2 --> 45119.906 | Lr --> 0.004 | Seconds_per_step --> 2.035 | [2024-07-31 12:18:48,480][Main][INFO] - [train] Step 81000 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.364 | Weights_l2 --> 45141.775 | Lr --> 0.004 | Seconds_per_step --> 2.036 | [2024-07-31 12:22:11,390][Main][INFO] - [train] Step 81100 out of 120000 | Loss --> 2.050 | Grad_l2 --> 0.363 | Weights_l2 --> 45164.108 | Lr --> 0.004 | Seconds_per_step --> 2.029 | [2024-07-31 12:25:34,668][Main][INFO] - [train] Step 81200 out of 120000 | Loss --> 2.063 | Grad_l2 --> 0.363 | Weights_l2 --> 45186.445 | Lr --> 0.004 | Seconds_per_step --> 2.033 | [2024-07-31 12:28:57,382][Main][INFO] - [train] Step 81300 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.365 | Weights_l2 --> 45208.650 | Lr --> 0.004 | Seconds_per_step --> 2.027 | [2024-07-31 12:32:20,352][Main][INFO] - [train] Step 81400 out of 120000 | Loss --> 2.059 | Grad_l2 --> 0.364 | Weights_l2 --> 45230.799 | Lr --> 0.004 | Seconds_per_step --> 2.030 | [2024-07-31 12:35:42,847][Main][INFO] - [train] Step 81500 out of 120000 | Loss --> 2.070 | Grad_l2 --> 0.367 | Weights_l2 --> 45253.149 | Lr --> 0.004 | Seconds_per_step --> 2.025 | [2024-07-31 12:39:07,396][Main][INFO] - [train] Step 81600 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.367 | Weights_l2 --> 45275.793 | Lr --> 0.004 | Seconds_per_step --> 2.045 | [2024-07-31 12:42:29,513][Main][INFO] - [train] Step 81700 out of 120000 | Loss --> 2.059 | Grad_l2 --> 0.371 | Weights_l2 --> 45298.441 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 12:45:54,166][Main][INFO] - [train] Step 81800 out of 120000 | Loss --> 2.051 | Grad_l2 --> 0.364 | Weights_l2 --> 45321.469 | Lr --> 0.003 | Seconds_per_step --> 2.047 | [2024-07-31 12:49:18,099][Main][INFO] - [train] Step 81900 out of 120000 | Loss --> 2.052 | Grad_l2 --> 0.368 | Weights_l2 --> 45343.864 | Lr --> 0.003 | Seconds_per_step --> 2.039 | [2024-07-31 12:52:41,936][Main][INFO] - [train] Step 82000 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.358 | Weights_l2 --> 45366.903 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-07-31 12:56:05,937][Main][INFO] - [train] Step 82100 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.362 | Weights_l2 --> 45389.983 | Lr --> 0.003 | Seconds_per_step --> 2.040 | [2024-07-31 12:59:26,085][Main][INFO] - [train] Step 82200 out of 120000 | Loss --> 2.035 | Grad_l2 --> 0.363 | Weights_l2 --> 45412.862 | Lr --> 0.003 | Seconds_per_step --> 2.001 | [2024-07-31 13:02:48,232][Main][INFO] - [train] Step 82300 out of 120000 | Loss --> 2.038 | Grad_l2 --> 0.374 | Weights_l2 --> 45435.817 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 13:06:14,933][Main][INFO] - [train] Step 82400 out of 120000 | Loss --> 2.029 | Grad_l2 --> 0.368 | Weights_l2 --> 45459.029 | Lr --> 0.003 | Seconds_per_step --> 2.067 | [2024-07-31 13:09:34,885][Main][INFO] - [train] Step 82500 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.361 | Weights_l2 --> 45481.684 | Lr --> 0.003 | Seconds_per_step --> 2.000 | [2024-07-31 13:12:57,554][Main][INFO] - [train] Step 82600 out of 120000 | Loss --> 2.042 | Grad_l2 --> 0.361 | Weights_l2 --> 45504.646 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-07-31 13:16:22,720][Main][INFO] - [train] Step 82700 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.371 | Weights_l2 --> 45527.372 | Lr --> 0.003 | Seconds_per_step --> 2.052 | [2024-07-31 13:19:42,399][Main][INFO] - [train] Step 82800 out of 120000 | Loss --> 2.041 | Grad_l2 --> 0.369 | Weights_l2 --> 45549.721 | Lr --> 0.003 | Seconds_per_step --> 1.997 | [2024-07-31 13:23:07,148][Main][INFO] - [train] Step 82900 out of 120000 | Loss --> 2.041 | Grad_l2 --> 0.364 | Weights_l2 --> 45572.541 | Lr --> 0.003 | Seconds_per_step --> 2.047 | [2024-07-31 13:26:27,753][Main][INFO] - [train] Step 83000 out of 120000 | Loss --> 2.061 | Grad_l2 --> 0.405 | Weights_l2 --> 45594.847 | Lr --> 0.003 | Seconds_per_step --> 2.006 | [2024-07-31 13:29:49,671][Main][INFO] - [train] Step 83100 out of 120000 | Loss --> 2.057 | Grad_l2 --> 0.373 | Weights_l2 --> 45617.293 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 13:33:14,621][Main][INFO] - [train] Step 83200 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.368 | Weights_l2 --> 45639.510 | Lr --> 0.003 | Seconds_per_step --> 2.049 | [2024-07-31 13:36:37,839][Main][INFO] - [train] Step 83300 out of 120000 | Loss --> 2.074 | Grad_l2 --> 0.365 | Weights_l2 --> 45661.986 | Lr --> 0.003 | Seconds_per_step --> 2.032 | [2024-07-31 13:40:00,034][Main][INFO] - [train] Step 83400 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.366 | Weights_l2 --> 45684.264 | Lr --> 0.003 | Seconds_per_step --> 2.022 | [2024-07-31 13:43:23,936][Main][INFO] - [train] Step 83500 out of 120000 | Loss --> 2.055 | Grad_l2 --> 0.366 | Weights_l2 --> 45706.354 | Lr --> 0.003 | Seconds_per_step --> 2.039 | [2024-07-31 13:46:44,971][Main][INFO] - [train] Step 83600 out of 120000 | Loss --> 2.052 | Grad_l2 --> 0.365 | Weights_l2 --> 45728.561 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 13:50:05,635][Main][INFO] - [train] Step 83700 out of 120000 | Loss --> 2.061 | Grad_l2 --> 0.368 | Weights_l2 --> 45750.857 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 13:53:29,141][Main][INFO] - [train] Step 83800 out of 120000 | Loss --> 2.059 | Grad_l2 --> 0.367 | Weights_l2 --> 45772.304 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-07-31 13:56:51,221][Main][INFO] - [train] Step 83900 out of 120000 | Loss --> 2.060 | Grad_l2 --> 0.362 | Weights_l2 --> 45794.363 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 14:00:13,196][Main][INFO] - [train] Step 84000 out of 120000 | Loss --> 2.076 | Grad_l2 --> 0.370 | Weights_l2 --> 45816.232 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 14:03:36,786][Main][INFO] - [train] Step 84100 out of 120000 | Loss --> 2.085 | Grad_l2 --> 0.384 | Weights_l2 --> 45837.562 | Lr --> 0.003 | Seconds_per_step --> 2.036 | [2024-07-31 14:06:57,882][Main][INFO] - [train] Step 84200 out of 120000 | Loss --> 2.096 | Grad_l2 --> 0.374 | Weights_l2 --> 45859.531 | Lr --> 0.003 | Seconds_per_step --> 2.011 | [2024-07-31 14:10:21,472][Main][INFO] - [train] Step 84300 out of 120000 | Loss --> 2.090 | Grad_l2 --> 0.422 | Weights_l2 --> 45881.942 | Lr --> 0.003 | Seconds_per_step --> 2.036 | [2024-07-31 14:13:42,142][Main][INFO] - [train] Step 84400 out of 120000 | Loss --> 2.099 | Grad_l2 --> 0.369 | Weights_l2 --> 45904.183 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 14:17:04,410][Main][INFO] - [train] Step 84500 out of 120000 | Loss --> 2.097 | Grad_l2 --> 0.363 | Weights_l2 --> 45926.658 | Lr --> 0.003 | Seconds_per_step --> 2.023 | [2024-07-31 14:20:28,537][Main][INFO] - [train] Step 84600 out of 120000 | Loss --> 2.114 | Grad_l2 --> 0.391 | Weights_l2 --> 45949.221 | Lr --> 0.003 | Seconds_per_step --> 2.041 | [2024-07-31 14:23:50,962][Main][INFO] - [train] Step 84700 out of 120000 | Loss --> 2.120 | Grad_l2 --> 0.372 | Weights_l2 --> 45971.811 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-07-31 14:27:14,286][Main][INFO] - [train] Step 84800 out of 120000 | Loss --> 2.130 | Grad_l2 --> 0.372 | Weights_l2 --> 45994.866 | Lr --> 0.003 | Seconds_per_step --> 2.033 | [2024-07-31 14:30:40,710][Main][INFO] - [train] Step 84900 out of 120000 | Loss --> 2.134 | Grad_l2 --> 0.364 | Weights_l2 --> 46017.196 | Lr --> 0.003 | Seconds_per_step --> 2.064 | [2024-07-31 14:34:04,539][Main][INFO] - [train] Step 85000 out of 120000 | Loss --> 2.152 | Grad_l2 --> 0.376 | Weights_l2 --> 46040.371 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-07-31 14:45:03,481][Main][INFO] - [eval] Step 85000 out of 120000 | Loss --> 2.139 | Accuracy --> 0.612 | Time --> 658.897 | [2024-07-31 14:48:24,539][Main][INFO] - [train] Step 85100 out of 120000 | Loss --> 2.140 | Grad_l2 --> 0.369 | Weights_l2 --> 46063.097 | Lr --> 0.003 | Seconds_per_step --> 2.011 | [2024-07-31 14:51:49,196][Main][INFO] - [train] Step 85200 out of 120000 | Loss --> 2.127 | Grad_l2 --> 0.370 | Weights_l2 --> 46086.171 | Lr --> 0.003 | Seconds_per_step --> 2.046 | [2024-07-31 14:55:14,482][Main][INFO] - [train] Step 85300 out of 120000 | Loss --> 2.123 | Grad_l2 --> 0.377 | Weights_l2 --> 46108.851 | Lr --> 0.003 | Seconds_per_step --> 2.053 | [2024-07-31 14:58:39,971][Main][INFO] - [train] Step 85400 out of 120000 | Loss --> 2.129 | Grad_l2 --> 0.378 | Weights_l2 --> 46132.115 | Lr --> 0.003 | Seconds_per_step --> 2.055 | [2024-07-31 15:02:03,765][Main][INFO] - [train] Step 85500 out of 120000 | Loss --> 2.131 | Grad_l2 --> 0.373 | Weights_l2 --> 46155.283 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-07-31 15:05:27,991][Main][INFO] - [train] Step 85600 out of 120000 | Loss --> 2.113 | Grad_l2 --> 0.374 | Weights_l2 --> 46178.226 | Lr --> 0.003 | Seconds_per_step --> 2.042 | [2024-07-31 15:08:52,640][Main][INFO] - [train] Step 85700 out of 120000 | Loss --> 2.089 | Grad_l2 --> 0.372 | Weights_l2 --> 46201.639 | Lr --> 0.003 | Seconds_per_step --> 2.046 | [2024-07-31 15:12:16,732][Main][INFO] - [train] Step 85800 out of 120000 | Loss --> 2.089 | Grad_l2 --> 0.372 | Weights_l2 --> 46225.210 | Lr --> 0.003 | Seconds_per_step --> 2.041 | [2024-07-31 15:15:38,462][Main][INFO] - [train] Step 85900 out of 120000 | Loss --> 2.111 | Grad_l2 --> 0.366 | Weights_l2 --> 46248.024 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 15:19:02,214][Main][INFO] - [train] Step 86000 out of 120000 | Loss --> 2.108 | Grad_l2 --> 0.368 | Weights_l2 --> 46271.243 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-07-31 15:22:27,200][Main][INFO] - [train] Step 86100 out of 120000 | Loss --> 2.081 | Grad_l2 --> 0.368 | Weights_l2 --> 46294.085 | Lr --> 0.003 | Seconds_per_step --> 2.050 | [2024-07-31 15:25:48,822][Main][INFO] - [train] Step 86200 out of 120000 | Loss --> 2.090 | Grad_l2 --> 0.371 | Weights_l2 --> 46317.374 | Lr --> 0.003 | Seconds_per_step --> 2.016 | [2024-07-31 15:29:11,891][Main][INFO] - [train] Step 86300 out of 120000 | Loss --> 2.085 | Grad_l2 --> 0.373 | Weights_l2 --> 46340.131 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-07-31 15:33:22,149][Main][INFO] - [train] Step 86400 out of 120000 | Loss --> 2.076 | Grad_l2 --> 0.367 | Weights_l2 --> 46363.192 | Lr --> 0.003 | Seconds_per_step --> 2.503 | [2024-07-31 15:36:43,196][Main][INFO] - [train] Step 86500 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.373 | Weights_l2 --> 46386.002 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 15:40:04,939][Main][INFO] - [train] Step 86600 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.374 | Weights_l2 --> 46408.191 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 15:43:25,069][Main][INFO] - [train] Step 86700 out of 120000 | Loss --> 2.080 | Grad_l2 --> 0.374 | Weights_l2 --> 46430.369 | Lr --> 0.003 | Seconds_per_step --> 2.001 | [2024-07-31 15:46:47,638][Main][INFO] - [train] Step 86800 out of 120000 | Loss --> 2.050 | Grad_l2 --> 0.369 | Weights_l2 --> 46452.230 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-07-31 15:50:08,164][Main][INFO] - [train] Step 86900 out of 120000 | Loss --> 2.054 | Grad_l2 --> 0.378 | Weights_l2 --> 46473.877 | Lr --> 0.003 | Seconds_per_step --> 2.005 | [2024-07-31 15:53:27,374][Main][INFO] - [train] Step 87000 out of 120000 | Loss --> 2.065 | Grad_l2 --> 0.372 | Weights_l2 --> 46495.936 | Lr --> 0.003 | Seconds_per_step --> 1.992 | [2024-07-31 15:56:51,547][Main][INFO] - [train] Step 87100 out of 120000 | Loss --> 2.062 | Grad_l2 --> 0.370 | Weights_l2 --> 46517.277 | Lr --> 0.003 | Seconds_per_step --> 2.042 | [2024-07-31 16:00:12,004][Main][INFO] - [train] Step 87200 out of 120000 | Loss --> 2.050 | Grad_l2 --> 0.368 | Weights_l2 --> 46539.216 | Lr --> 0.003 | Seconds_per_step --> 2.005 | [2024-07-31 16:03:31,438][Main][INFO] - [train] Step 87300 out of 120000 | Loss --> 2.064 | Grad_l2 --> 0.378 | Weights_l2 --> 46560.584 | Lr --> 0.003 | Seconds_per_step --> 1.994 | [2024-07-31 16:06:54,490][Main][INFO] - [train] Step 87400 out of 120000 | Loss --> 2.060 | Grad_l2 --> 0.365 | Weights_l2 --> 46581.895 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-07-31 16:10:13,389][Main][INFO] - [train] Step 87500 out of 120000 | Loss --> 2.054 | Grad_l2 --> 0.389 | Weights_l2 --> 46603.263 | Lr --> 0.003 | Seconds_per_step --> 1.989 | [2024-07-31 16:13:35,125][Main][INFO] - [train] Step 87600 out of 120000 | Loss --> 2.052 | Grad_l2 --> 0.365 | Weights_l2 --> 46624.240 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 16:16:59,781][Main][INFO] - [train] Step 87700 out of 120000 | Loss --> 2.052 | Grad_l2 --> 0.363 | Weights_l2 --> 46645.313 | Lr --> 0.003 | Seconds_per_step --> 2.047 | [2024-07-31 16:20:20,292][Main][INFO] - [train] Step 87800 out of 120000 | Loss --> 2.049 | Grad_l2 --> 0.380 | Weights_l2 --> 46666.502 | Lr --> 0.003 | Seconds_per_step --> 2.005 | [2024-07-31 16:23:42,829][Main][INFO] - [train] Step 87900 out of 120000 | Loss --> 2.053 | Grad_l2 --> 0.368 | Weights_l2 --> 46687.217 | Lr --> 0.003 | Seconds_per_step --> 2.025 | [2024-07-31 16:27:08,753][Main][INFO] - [train] Step 88000 out of 120000 | Loss --> 2.037 | Grad_l2 --> 0.362 | Weights_l2 --> 46708.196 | Lr --> 0.003 | Seconds_per_step --> 2.059 | [2024-07-31 16:30:29,466][Main][INFO] - [train] Step 88100 out of 120000 | Loss --> 2.045 | Grad_l2 --> 0.359 | Weights_l2 --> 46729.026 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 16:33:52,693][Main][INFO] - [train] Step 88200 out of 120000 | Loss --> 2.049 | Grad_l2 --> 0.371 | Weights_l2 --> 46749.714 | Lr --> 0.003 | Seconds_per_step --> 2.032 | [2024-07-31 16:37:16,432][Main][INFO] - [train] Step 88300 out of 120000 | Loss --> 2.056 | Grad_l2 --> 0.374 | Weights_l2 --> 46770.829 | Lr --> 0.003 | Seconds_per_step --> 2.037 | [2024-07-31 16:40:38,245][Main][INFO] - [train] Step 88400 out of 120000 | Loss --> 2.039 | Grad_l2 --> 0.424 | Weights_l2 --> 46791.872 | Lr --> 0.003 | Seconds_per_step --> 2.018 | [2024-07-31 16:43:59,192][Main][INFO] - [train] Step 88500 out of 120000 | Loss --> 2.041 | Grad_l2 --> 0.365 | Weights_l2 --> 46813.153 | Lr --> 0.003 | Seconds_per_step --> 2.009 | [2024-07-31 16:47:24,316][Main][INFO] - [train] Step 88600 out of 120000 | Loss --> 2.036 | Grad_l2 --> 0.453 | Weights_l2 --> 46833.861 | Lr --> 0.003 | Seconds_per_step --> 2.051 | [2024-07-31 16:50:44,973][Main][INFO] - [train] Step 88700 out of 120000 | Loss --> 2.033 | Grad_l2 --> 0.375 | Weights_l2 --> 46854.967 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 16:54:06,362][Main][INFO] - [train] Step 88800 out of 120000 | Loss --> 2.038 | Grad_l2 --> 0.362 | Weights_l2 --> 46875.266 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-07-31 16:57:30,974][Main][INFO] - [train] Step 88900 out of 120000 | Loss --> 2.042 | Grad_l2 --> 0.368 | Weights_l2 --> 46895.872 | Lr --> 0.003 | Seconds_per_step --> 2.046 | [2024-07-31 17:00:52,855][Main][INFO] - [train] Step 89000 out of 120000 | Loss --> 2.035 | Grad_l2 --> 0.358 | Weights_l2 --> 46916.075 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 17:04:15,555][Main][INFO] - [train] Step 89100 out of 120000 | Loss --> 2.045 | Grad_l2 --> 0.389 | Weights_l2 --> 46936.621 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-07-31 17:07:39,443][Main][INFO] - [train] Step 89200 out of 120000 | Loss --> 2.021 | Grad_l2 --> 0.362 | Weights_l2 --> 46956.987 | Lr --> 0.003 | Seconds_per_step --> 2.039 | [2024-07-31 17:10:59,171][Main][INFO] - [train] Step 89300 out of 120000 | Loss --> 2.029 | Grad_l2 --> 0.370 | Weights_l2 --> 46977.347 | Lr --> 0.003 | Seconds_per_step --> 1.997 | [2024-07-31 17:14:22,282][Main][INFO] - [train] Step 89400 out of 120000 | Loss --> 2.023 | Grad_l2 --> 0.364 | Weights_l2 --> 46997.708 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-07-31 17:17:45,175][Main][INFO] - [train] Step 89500 out of 120000 | Loss --> 2.032 | Grad_l2 --> 0.365 | Weights_l2 --> 47017.617 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-07-31 17:21:06,863][Main][INFO] - [train] Step 89600 out of 120000 | Loss --> 2.019 | Grad_l2 --> 0.363 | Weights_l2 --> 47038.256 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 17:24:28,985][Main][INFO] - [train] Step 89700 out of 120000 | Loss --> 2.024 | Grad_l2 --> 0.364 | Weights_l2 --> 47058.632 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 17:27:52,592][Main][INFO] - [train] Step 89800 out of 120000 | Loss --> 2.026 | Grad_l2 --> 0.364 | Weights_l2 --> 47079.196 | Lr --> 0.003 | Seconds_per_step --> 2.036 | [2024-07-31 17:31:12,957][Main][INFO] - [train] Step 89900 out of 120000 | Loss --> 2.027 | Grad_l2 --> 0.366 | Weights_l2 --> 47099.700 | Lr --> 0.003 | Seconds_per_step --> 2.004 | [2024-07-31 17:34:37,182][Main][INFO] - [train] Step 90000 out of 120000 | Loss --> 2.029 | Grad_l2 --> 0.368 | Weights_l2 --> 47120.174 | Lr --> 0.003 | Seconds_per_step --> 2.042 | [2024-07-31 17:45:34,887][Main][INFO] - [eval] Step 90000 out of 120000 | Loss --> 2.127 | Accuracy --> 0.613 | Time --> 657.702 | [2024-07-31 17:45:34,890][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-90000 [2024-07-31 17:45:34,894][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-31 17:45:37,955][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-90000/model.safetensors [2024-07-31 17:45:38,006][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-90000/optimizer.bin [2024-07-31 17:45:38,007][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-90000/scheduler.bin [2024-07-31 17:45:38,007][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-90000/sampler.bin [2024-07-31 17:45:38,007][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-90000/sampler_1.bin [2024-07-31 17:45:38,008][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-90000/random_states_0.pkl [2024-07-31 17:49:00,583][Main][INFO] - [train] Step 90100 out of 120000 | Loss --> 2.015 | Grad_l2 --> 0.365 | Weights_l2 --> 47140.677 | Lr --> 0.003 | Seconds_per_step --> 2.057 | [2024-07-31 17:52:22,153][Main][INFO] - [train] Step 90200 out of 120000 | Loss --> 1.997 | Grad_l2 --> 0.370 | Weights_l2 --> 47160.613 | Lr --> 0.003 | Seconds_per_step --> 2.016 | [2024-07-31 17:55:44,886][Main][INFO] - [train] Step 90300 out of 120000 | Loss --> 1.998 | Grad_l2 --> 0.361 | Weights_l2 --> 47180.539 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-07-31 17:59:06,299][Main][INFO] - [train] Step 90400 out of 120000 | Loss --> 1.989 | Grad_l2 --> 0.367 | Weights_l2 --> 47200.227 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-07-31 18:02:28,490][Main][INFO] - [train] Step 90500 out of 120000 | Loss --> 2.007 | Grad_l2 --> 0.472 | Weights_l2 --> 47220.364 | Lr --> 0.003 | Seconds_per_step --> 2.022 | [2024-07-31 18:05:50,362][Main][INFO] - [train] Step 90600 out of 120000 | Loss --> 2.004 | Grad_l2 --> 0.353 | Weights_l2 --> 47240.687 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 18:09:13,035][Main][INFO] - [train] Step 90700 out of 120000 | Loss --> 1.979 | Grad_l2 --> 0.363 | Weights_l2 --> 47260.341 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-07-31 18:12:34,863][Main][INFO] - [train] Step 90800 out of 120000 | Loss --> 1.989 | Grad_l2 --> 0.358 | Weights_l2 --> 47280.534 | Lr --> 0.003 | Seconds_per_step --> 2.018 | [2024-07-31 18:15:57,469][Main][INFO] - [train] Step 90900 out of 120000 | Loss --> 1.992 | Grad_l2 --> 0.359 | Weights_l2 --> 47300.372 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-07-31 18:19:20,554][Main][INFO] - [train] Step 91000 out of 120000 | Loss --> 1.981 | Grad_l2 --> 0.364 | Weights_l2 --> 47320.435 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-07-31 18:22:41,552][Main][INFO] - [train] Step 91100 out of 120000 | Loss --> 1.988 | Grad_l2 --> 0.361 | Weights_l2 --> 47340.532 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 18:26:04,456][Main][INFO] - [train] Step 91200 out of 120000 | Loss --> 1.984 | Grad_l2 --> 0.360 | Weights_l2 --> 47360.263 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-07-31 18:29:28,381][Main][INFO] - [train] Step 91300 out of 120000 | Loss --> 2.002 | Grad_l2 --> 0.356 | Weights_l2 --> 47380.131 | Lr --> 0.003 | Seconds_per_step --> 2.039 | [2024-07-31 18:32:49,844][Main][INFO] - [train] Step 91400 out of 120000 | Loss --> 1.991 | Grad_l2 --> 0.359 | Weights_l2 --> 47400.232 | Lr --> 0.003 | Seconds_per_step --> 2.015 | [2024-07-31 18:36:11,903][Main][INFO] - [train] Step 91500 out of 120000 | Loss --> 1.987 | Grad_l2 --> 0.364 | Weights_l2 --> 47419.891 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 18:39:34,334][Main][INFO] - [train] Step 91600 out of 120000 | Loss --> 1.997 | Grad_l2 --> 0.401 | Weights_l2 --> 47440.085 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-07-31 18:42:58,165][Main][INFO] - [train] Step 91700 out of 120000 | Loss --> 2.007 | Grad_l2 --> 0.362 | Weights_l2 --> 47460.334 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-07-31 18:46:20,853][Main][INFO] - [train] Step 91800 out of 120000 | Loss --> 2.001 | Grad_l2 --> 0.368 | Weights_l2 --> 47480.716 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-07-31 18:49:41,647][Main][INFO] - [train] Step 91900 out of 120000 | Loss --> 2.016 | Grad_l2 --> 0.362 | Weights_l2 --> 47500.819 | Lr --> 0.003 | Seconds_per_step --> 2.008 | [2024-07-31 18:53:05,403][Main][INFO] - [train] Step 92000 out of 120000 | Loss --> 2.019 | Grad_l2 --> 0.364 | Weights_l2 --> 47521.502 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-07-31 18:56:28,835][Main][INFO] - [train] Step 92100 out of 120000 | Loss --> 2.017 | Grad_l2 --> 0.361 | Weights_l2 --> 47541.856 | Lr --> 0.003 | Seconds_per_step --> 2.034 | [2024-07-31 18:59:50,585][Main][INFO] - [train] Step 92200 out of 120000 | Loss --> 2.017 | Grad_l2 --> 0.361 | Weights_l2 --> 47562.062 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 19:03:14,173][Main][INFO] - [train] Step 92300 out of 120000 | Loss --> 2.005 | Grad_l2 --> 0.371 | Weights_l2 --> 47582.428 | Lr --> 0.003 | Seconds_per_step --> 2.036 | [2024-07-31 19:06:36,159][Main][INFO] - [train] Step 92400 out of 120000 | Loss --> 2.018 | Grad_l2 --> 0.363 | Weights_l2 --> 47603.014 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 19:10:01,708][Main][INFO] - [train] Step 92500 out of 120000 | Loss --> 2.012 | Grad_l2 --> 0.366 | Weights_l2 --> 47623.543 | Lr --> 0.003 | Seconds_per_step --> 2.055 | [2024-07-31 19:13:21,551][Main][INFO] - [train] Step 92600 out of 120000 | Loss --> 2.022 | Grad_l2 --> 0.367 | Weights_l2 --> 47644.545 | Lr --> 0.003 | Seconds_per_step --> 1.998 | [2024-07-31 19:16:40,435][Main][INFO] - [train] Step 92700 out of 120000 | Loss --> 2.043 | Grad_l2 --> 0.368 | Weights_l2 --> 47665.210 | Lr --> 0.003 | Seconds_per_step --> 1.989 | [2024-07-31 19:20:05,371][Main][INFO] - [train] Step 92800 out of 120000 | Loss --> 2.051 | Grad_l2 --> 0.360 | Weights_l2 --> 47685.716 | Lr --> 0.003 | Seconds_per_step --> 2.049 | [2024-07-31 19:23:26,389][Main][INFO] - [train] Step 92900 out of 120000 | Loss --> 2.047 | Grad_l2 --> 0.366 | Weights_l2 --> 47705.980 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 19:26:51,238][Main][INFO] - [train] Step 93000 out of 120000 | Loss --> 2.018 | Grad_l2 --> 0.364 | Weights_l2 --> 47726.298 | Lr --> 0.003 | Seconds_per_step --> 2.048 | [2024-07-31 19:30:14,775][Main][INFO] - [train] Step 93100 out of 120000 | Loss --> 2.017 | Grad_l2 --> 0.364 | Weights_l2 --> 47746.905 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-07-31 19:33:35,539][Main][INFO] - [train] Step 93200 out of 120000 | Loss --> 2.012 | Grad_l2 --> 0.359 | Weights_l2 --> 47767.370 | Lr --> 0.003 | Seconds_per_step --> 2.008 | [2024-07-31 19:36:57,566][Main][INFO] - [train] Step 93300 out of 120000 | Loss --> 2.011 | Grad_l2 --> 0.373 | Weights_l2 --> 47787.564 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 19:40:19,255][Main][INFO] - [train] Step 93400 out of 120000 | Loss --> 2.015 | Grad_l2 --> 0.370 | Weights_l2 --> 47807.680 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 19:43:39,951][Main][INFO] - [train] Step 93500 out of 120000 | Loss --> 2.009 | Grad_l2 --> 0.376 | Weights_l2 --> 47827.561 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 19:47:02,439][Main][INFO] - [train] Step 93600 out of 120000 | Loss --> 1.992 | Grad_l2 --> 0.358 | Weights_l2 --> 47847.271 | Lr --> 0.003 | Seconds_per_step --> 2.025 | [2024-07-31 19:50:22,878][Main][INFO] - [train] Step 93700 out of 120000 | Loss --> 2.018 | Grad_l2 --> 0.364 | Weights_l2 --> 47867.157 | Lr --> 0.003 | Seconds_per_step --> 2.004 | [2024-07-31 19:53:45,518][Main][INFO] - [train] Step 93800 out of 120000 | Loss --> 2.005 | Grad_l2 --> 0.360 | Weights_l2 --> 47886.772 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-07-31 19:57:08,034][Main][INFO] - [train] Step 93900 out of 120000 | Loss --> 1.987 | Grad_l2 --> 0.362 | Weights_l2 --> 47906.351 | Lr --> 0.003 | Seconds_per_step --> 2.025 | [2024-07-31 20:00:29,625][Main][INFO] - [train] Step 94000 out of 120000 | Loss --> 1.997 | Grad_l2 --> 0.358 | Weights_l2 --> 47925.785 | Lr --> 0.003 | Seconds_per_step --> 2.016 | [2024-07-31 20:03:54,560][Main][INFO] - [train] Step 94100 out of 120000 | Loss --> 1.988 | Grad_l2 --> 0.360 | Weights_l2 --> 47944.976 | Lr --> 0.003 | Seconds_per_step --> 2.049 | [2024-07-31 20:07:15,954][Main][INFO] - [train] Step 94200 out of 120000 | Loss --> 1.990 | Grad_l2 --> 0.356 | Weights_l2 --> 47964.520 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-07-31 20:10:36,668][Main][INFO] - [train] Step 94300 out of 120000 | Loss --> 1.981 | Grad_l2 --> 0.360 | Weights_l2 --> 47984.074 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 20:14:00,387][Main][INFO] - [train] Step 94400 out of 120000 | Loss --> 1.981 | Grad_l2 --> 0.373 | Weights_l2 --> 48003.372 | Lr --> 0.003 | Seconds_per_step --> 2.037 | [2024-07-31 20:17:23,357][Main][INFO] - [train] Step 94500 out of 120000 | Loss --> 1.984 | Grad_l2 --> 0.365 | Weights_l2 --> 48023.008 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-07-31 20:20:43,297][Main][INFO] - [train] Step 94600 out of 120000 | Loss --> 1.969 | Grad_l2 --> 0.358 | Weights_l2 --> 48042.328 | Lr --> 0.003 | Seconds_per_step --> 1.999 | [2024-07-31 20:24:06,184][Main][INFO] - [train] Step 94700 out of 120000 | Loss --> 1.979 | Grad_l2 --> 0.366 | Weights_l2 --> 48061.907 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-07-31 20:27:29,051][Main][INFO] - [train] Step 94800 out of 120000 | Loss --> 1.970 | Grad_l2 --> 0.349 | Weights_l2 --> 48081.281 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-07-31 20:30:49,638][Main][INFO] - [train] Step 94900 out of 120000 | Loss --> 1.975 | Grad_l2 --> 0.359 | Weights_l2 --> 48100.562 | Lr --> 0.003 | Seconds_per_step --> 2.006 | [2024-07-31 20:34:12,743][Main][INFO] - [train] Step 95000 out of 120000 | Loss --> 1.984 | Grad_l2 --> 0.364 | Weights_l2 --> 48119.788 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-07-31 20:45:12,488][Main][INFO] - [eval] Step 95000 out of 120000 | Loss --> 2.112 | Accuracy --> 0.615 | Time --> 659.743 | [2024-07-31 20:48:34,898][Main][INFO] - [train] Step 95100 out of 120000 | Loss --> 1.982 | Grad_l2 --> 0.391 | Weights_l2 --> 48139.323 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-07-31 20:51:57,010][Main][INFO] - [train] Step 95200 out of 120000 | Loss --> 1.972 | Grad_l2 --> 0.629 | Weights_l2 --> 48158.732 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 20:55:17,353][Main][INFO] - [train] Step 95300 out of 120000 | Loss --> 1.979 | Grad_l2 --> 0.545 | Weights_l2 --> 48177.871 | Lr --> 0.003 | Seconds_per_step --> 2.003 | [2024-07-31 20:58:41,035][Main][INFO] - [train] Step 95400 out of 120000 | Loss --> 1.988 | Grad_l2 --> 0.421 | Weights_l2 --> 48197.359 | Lr --> 0.003 | Seconds_per_step --> 2.037 | [2024-07-31 21:02:03,432][Main][INFO] - [train] Step 95500 out of 120000 | Loss --> 1.993 | Grad_l2 --> 0.616 | Weights_l2 --> 48216.833 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-07-31 21:05:24,799][Main][INFO] - [train] Step 95600 out of 120000 | Loss --> 1.996 | Grad_l2 --> 0.367 | Weights_l2 --> 48236.091 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-07-31 21:08:46,482][Main][INFO] - [train] Step 95700 out of 120000 | Loss --> 1.997 | Grad_l2 --> 0.363 | Weights_l2 --> 48255.479 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 21:12:08,323][Main][INFO] - [train] Step 95800 out of 120000 | Loss --> 1.994 | Grad_l2 --> 0.386 | Weights_l2 --> 48274.733 | Lr --> 0.003 | Seconds_per_step --> 2.018 | [2024-07-31 21:15:30,272][Main][INFO] - [train] Step 95900 out of 120000 | Loss --> 1.988 | Grad_l2 --> 0.368 | Weights_l2 --> 48294.159 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 21:18:54,252][Main][INFO] - [train] Step 96000 out of 120000 | Loss --> 1.990 | Grad_l2 --> 0.357 | Weights_l2 --> 48313.281 | Lr --> 0.003 | Seconds_per_step --> 2.040 | [2024-07-31 21:22:13,383][Main][INFO] - [train] Step 96100 out of 120000 | Loss --> 1.995 | Grad_l2 --> 0.367 | Weights_l2 --> 48332.249 | Lr --> 0.003 | Seconds_per_step --> 1.991 | [2024-07-31 21:25:35,598][Main][INFO] - [train] Step 96200 out of 120000 | Loss --> 1.985 | Grad_l2 --> 0.359 | Weights_l2 --> 48351.166 | Lr --> 0.003 | Seconds_per_step --> 2.022 | [2024-07-31 21:28:57,568][Main][INFO] - [train] Step 96300 out of 120000 | Loss --> 1.983 | Grad_l2 --> 0.368 | Weights_l2 --> 48370.048 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 21:32:17,441][Main][INFO] - [train] Step 96400 out of 120000 | Loss --> 1.982 | Grad_l2 --> 0.364 | Weights_l2 --> 48388.737 | Lr --> 0.003 | Seconds_per_step --> 1.999 | [2024-07-31 21:35:39,435][Main][INFO] - [train] Step 96500 out of 120000 | Loss --> 1.977 | Grad_l2 --> 0.360 | Weights_l2 --> 48407.699 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 21:39:02,023][Main][INFO] - [train] Step 96600 out of 120000 | Loss --> 1.993 | Grad_l2 --> 0.357 | Weights_l2 --> 48426.620 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-07-31 21:42:22,892][Main][INFO] - [train] Step 96700 out of 120000 | Loss --> 1.979 | Grad_l2 --> 0.361 | Weights_l2 --> 48445.210 | Lr --> 0.003 | Seconds_per_step --> 2.009 | [2024-07-31 21:45:44,754][Main][INFO] - [train] Step 96800 out of 120000 | Loss --> 1.986 | Grad_l2 --> 0.360 | Weights_l2 --> 48464.279 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 21:49:06,774][Main][INFO] - [train] Step 96900 out of 120000 | Loss --> 1.985 | Grad_l2 --> 0.363 | Weights_l2 --> 48483.572 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 21:52:28,453][Main][INFO] - [train] Step 97000 out of 120000 | Loss --> 1.998 | Grad_l2 --> 0.360 | Weights_l2 --> 48502.675 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 21:55:50,399][Main][INFO] - [train] Step 97100 out of 120000 | Loss --> 2.004 | Grad_l2 --> 0.359 | Weights_l2 --> 48521.766 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 21:59:13,428][Main][INFO] - [train] Step 97200 out of 120000 | Loss --> 2.008 | Grad_l2 --> 0.363 | Weights_l2 --> 48540.773 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-07-31 22:02:34,379][Main][INFO] - [train] Step 97300 out of 120000 | Loss --> 2.022 | Grad_l2 --> 0.460 | Weights_l2 --> 48559.730 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 22:05:56,463][Main][INFO] - [train] Step 97400 out of 120000 | Loss --> 2.018 | Grad_l2 --> 0.366 | Weights_l2 --> 48578.780 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 22:09:19,172][Main][INFO] - [train] Step 97500 out of 120000 | Loss --> 2.028 | Grad_l2 --> 0.362 | Weights_l2 --> 48598.453 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-07-31 22:12:40,083][Main][INFO] - [train] Step 97600 out of 120000 | Loss --> 2.043 | Grad_l2 --> 0.369 | Weights_l2 --> 48617.946 | Lr --> 0.003 | Seconds_per_step --> 2.009 | [2024-07-31 22:16:01,959][Main][INFO] - [train] Step 97700 out of 120000 | Loss --> 2.052 | Grad_l2 --> 0.363 | Weights_l2 --> 48637.678 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-07-31 22:19:23,670][Main][INFO] - [train] Step 97800 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.373 | Weights_l2 --> 48657.254 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 22:22:45,955][Main][INFO] - [train] Step 97900 out of 120000 | Loss --> 2.039 | Grad_l2 --> 0.367 | Weights_l2 --> 48676.798 | Lr --> 0.003 | Seconds_per_step --> 2.023 | [2024-07-31 22:26:10,385][Main][INFO] - [train] Step 98000 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.363 | Weights_l2 --> 48696.629 | Lr --> 0.003 | Seconds_per_step --> 2.044 | [2024-07-31 22:29:31,356][Main][INFO] - [train] Step 98100 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.362 | Weights_l2 --> 48716.692 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 22:32:54,337][Main][INFO] - [train] Step 98200 out of 120000 | Loss --> 2.053 | Grad_l2 --> 0.363 | Weights_l2 --> 48736.464 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-07-31 22:36:15,781][Main][INFO] - [train] Step 98300 out of 120000 | Loss --> 2.043 | Grad_l2 --> 0.357 | Weights_l2 --> 48756.049 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-07-31 22:39:38,185][Main][INFO] - [train] Step 98400 out of 120000 | Loss --> 2.039 | Grad_l2 --> 0.360 | Weights_l2 --> 48775.869 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-07-31 22:43:00,963][Main][INFO] - [train] Step 98500 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.364 | Weights_l2 --> 48795.693 | Lr --> 0.003 | Seconds_per_step --> 2.028 | [2024-07-31 22:46:22,972][Main][INFO] - [train] Step 98600 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.366 | Weights_l2 --> 48815.071 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-07-31 22:49:43,695][Main][INFO] - [train] Step 98700 out of 120000 | Loss --> 2.037 | Grad_l2 --> 0.361 | Weights_l2 --> 48834.663 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 22:53:06,080][Main][INFO] - [train] Step 98800 out of 120000 | Loss --> 2.033 | Grad_l2 --> 0.365 | Weights_l2 --> 48854.132 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-07-31 22:56:26,752][Main][INFO] - [train] Step 98900 out of 120000 | Loss --> 2.025 | Grad_l2 --> 0.362 | Weights_l2 --> 48873.630 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 22:59:47,529][Main][INFO] - [train] Step 99000 out of 120000 | Loss --> 2.029 | Grad_l2 --> 0.363 | Weights_l2 --> 48893.138 | Lr --> 0.003 | Seconds_per_step --> 2.008 | [2024-07-31 23:03:11,062][Main][INFO] - [train] Step 99100 out of 120000 | Loss --> 2.013 | Grad_l2 --> 0.357 | Weights_l2 --> 48912.311 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-07-31 23:06:33,661][Main][INFO] - [train] Step 99200 out of 120000 | Loss --> 2.005 | Grad_l2 --> 0.368 | Weights_l2 --> 48931.379 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-07-31 23:09:54,669][Main][INFO] - [train] Step 99300 out of 120000 | Loss --> 1.980 | Grad_l2 --> 0.361 | Weights_l2 --> 48950.371 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-07-31 23:13:17,222][Main][INFO] - [train] Step 99400 out of 120000 | Loss --> 1.992 | Grad_l2 --> 0.363 | Weights_l2 --> 48969.019 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-07-31 23:16:38,661][Main][INFO] - [train] Step 99500 out of 120000 | Loss --> 1.977 | Grad_l2 --> 0.364 | Weights_l2 --> 48987.654 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-07-31 23:19:59,365][Main][INFO] - [train] Step 99600 out of 120000 | Loss --> 1.964 | Grad_l2 --> 0.365 | Weights_l2 --> 49006.081 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-07-31 23:23:23,062][Main][INFO] - [train] Step 99700 out of 120000 | Loss --> 1.969 | Grad_l2 --> 0.361 | Weights_l2 --> 49024.692 | Lr --> 0.003 | Seconds_per_step --> 2.037 | [2024-07-31 23:26:43,371][Main][INFO] - [train] Step 99800 out of 120000 | Loss --> 1.978 | Grad_l2 --> 0.360 | Weights_l2 --> 49043.221 | Lr --> 0.003 | Seconds_per_step --> 2.003 | [2024-07-31 23:30:05,475][Main][INFO] - [train] Step 99900 out of 120000 | Loss --> 1.955 | Grad_l2 --> 0.361 | Weights_l2 --> 49061.545 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-07-31 23:33:27,145][Main][INFO] - [train] Step 100000 out of 120000 | Loss --> 1.955 | Grad_l2 --> 0.364 | Weights_l2 --> 49079.980 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-07-31 23:44:25,711][Main][INFO] - [eval] Step 100000 out of 120000 | Loss --> 2.096 | Accuracy --> 0.617 | Time --> 658.563 | [2024-07-31 23:44:25,714][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-100000 [2024-07-31 23:44:25,717][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-07-31 23:44:28,742][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-100000/model.safetensors [2024-07-31 23:44:28,791][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-100000/optimizer.bin [2024-07-31 23:44:28,792][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-100000/scheduler.bin [2024-07-31 23:44:28,792][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-100000/sampler.bin [2024-07-31 23:44:28,792][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-100000/sampler_1.bin [2024-07-31 23:44:28,794][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-100000/random_states_0.pkl [2024-07-31 23:47:49,213][Main][INFO] - [train] Step 100100 out of 120000 | Loss --> 1.954 | Grad_l2 --> 0.359 | Weights_l2 --> 49098.437 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-07-31 23:51:13,753][Main][INFO] - [train] Step 100200 out of 120000 | Loss --> 1.956 | Grad_l2 --> 0.362 | Weights_l2 --> 49116.695 | Lr --> 0.003 | Seconds_per_step --> 2.045 | [2024-07-31 23:54:34,108][Main][INFO] - [train] Step 100300 out of 120000 | Loss --> 1.966 | Grad_l2 --> 0.365 | Weights_l2 --> 49135.378 | Lr --> 0.003 | Seconds_per_step --> 2.004 | [2024-07-31 23:57:55,389][Main][INFO] - [train] Step 100400 out of 120000 | Loss --> 1.960 | Grad_l2 --> 0.361 | Weights_l2 --> 49154.223 | Lr --> 0.003 | Seconds_per_step --> 2.013 | [2024-08-01 00:01:18,536][Main][INFO] - [train] Step 100500 out of 120000 | Loss --> 1.949 | Grad_l2 --> 0.359 | Weights_l2 --> 49173.093 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-08-01 00:04:39,762][Main][INFO] - [train] Step 100600 out of 120000 | Loss --> 1.957 | Grad_l2 --> 0.353 | Weights_l2 --> 49191.558 | Lr --> 0.003 | Seconds_per_step --> 2.012 | [2024-08-01 00:08:04,624][Main][INFO] - [train] Step 100700 out of 120000 | Loss --> 1.973 | Grad_l2 --> 0.359 | Weights_l2 --> 49210.622 | Lr --> 0.003 | Seconds_per_step --> 2.049 | [2024-08-01 00:11:25,955][Main][INFO] - [train] Step 100800 out of 120000 | Loss --> 1.970 | Grad_l2 --> 0.356 | Weights_l2 --> 49229.569 | Lr --> 0.003 | Seconds_per_step --> 2.013 | [2024-08-01 00:14:47,292][Main][INFO] - [train] Step 100900 out of 120000 | Loss --> 1.970 | Grad_l2 --> 0.364 | Weights_l2 --> 49248.740 | Lr --> 0.003 | Seconds_per_step --> 2.013 | [2024-08-01 00:18:08,987][Main][INFO] - [train] Step 101000 out of 120000 | Loss --> 1.975 | Grad_l2 --> 0.358 | Weights_l2 --> 49268.412 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-08-01 00:21:34,572][Main][INFO] - [train] Step 101100 out of 120000 | Loss --> 1.964 | Grad_l2 --> 0.367 | Weights_l2 --> 49287.873 | Lr --> 0.003 | Seconds_per_step --> 2.056 | [2024-08-01 00:24:53,735][Main][INFO] - [train] Step 101200 out of 120000 | Loss --> 1.969 | Grad_l2 --> 0.359 | Weights_l2 --> 49306.948 | Lr --> 0.003 | Seconds_per_step --> 1.992 | [2024-08-01 00:28:19,938][Main][INFO] - [train] Step 101300 out of 120000 | Loss --> 1.986 | Grad_l2 --> 0.357 | Weights_l2 --> 49326.553 | Lr --> 0.003 | Seconds_per_step --> 2.062 | [2024-08-01 00:31:41,173][Main][INFO] - [train] Step 101400 out of 120000 | Loss --> 1.974 | Grad_l2 --> 0.357 | Weights_l2 --> 49345.909 | Lr --> 0.003 | Seconds_per_step --> 2.012 | [2024-08-01 00:35:01,256][Main][INFO] - [train] Step 101500 out of 120000 | Loss --> 1.973 | Grad_l2 --> 0.363 | Weights_l2 --> 49365.186 | Lr --> 0.003 | Seconds_per_step --> 2.001 | [2024-08-01 00:38:26,934][Main][INFO] - [train] Step 101600 out of 120000 | Loss --> 1.984 | Grad_l2 --> 0.353 | Weights_l2 --> 49385.203 | Lr --> 0.003 | Seconds_per_step --> 2.057 | [2024-08-01 00:41:46,354][Main][INFO] - [train] Step 101700 out of 120000 | Loss --> 1.981 | Grad_l2 --> 0.361 | Weights_l2 --> 49404.977 | Lr --> 0.003 | Seconds_per_step --> 1.994 | [2024-08-01 00:45:10,049][Main][INFO] - [train] Step 101800 out of 120000 | Loss --> 1.991 | Grad_l2 --> 0.370 | Weights_l2 --> 49424.882 | Lr --> 0.003 | Seconds_per_step --> 2.037 | [2024-08-01 00:48:32,652][Main][INFO] - [train] Step 101900 out of 120000 | Loss --> 2.009 | Grad_l2 --> 0.363 | Weights_l2 --> 49444.497 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-08-01 00:51:55,600][Main][INFO] - [train] Step 102000 out of 120000 | Loss --> 2.014 | Grad_l2 --> 0.365 | Weights_l2 --> 49464.462 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-08-01 00:55:18,553][Main][INFO] - [train] Step 102100 out of 120000 | Loss --> 2.026 | Grad_l2 --> 0.368 | Weights_l2 --> 49484.548 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-08-01 00:58:41,462][Main][INFO] - [train] Step 102200 out of 120000 | Loss --> 2.033 | Grad_l2 --> 0.370 | Weights_l2 --> 49504.190 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-08-01 01:02:04,104][Main][INFO] - [train] Step 102300 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.363 | Weights_l2 --> 49523.944 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-08-01 01:05:29,390][Main][INFO] - [train] Step 102400 out of 120000 | Loss --> 2.041 | Grad_l2 --> 0.367 | Weights_l2 --> 49543.876 | Lr --> 0.003 | Seconds_per_step --> 2.053 | [2024-08-01 01:08:50,885][Main][INFO] - [train] Step 102500 out of 120000 | Loss --> 2.060 | Grad_l2 --> 0.370 | Weights_l2 --> 49563.162 | Lr --> 0.003 | Seconds_per_step --> 2.015 | [2024-08-01 01:12:12,590][Main][INFO] - [train] Step 102600 out of 120000 | Loss --> 2.068 | Grad_l2 --> 0.368 | Weights_l2 --> 49583.039 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-08-01 01:15:34,355][Main][INFO] - [train] Step 102700 out of 120000 | Loss --> 2.063 | Grad_l2 --> 0.368 | Weights_l2 --> 49602.708 | Lr --> 0.003 | Seconds_per_step --> 2.018 | [2024-08-01 01:18:56,544][Main][INFO] - [train] Step 102800 out of 120000 | Loss --> 2.071 | Grad_l2 --> 0.361 | Weights_l2 --> 49622.488 | Lr --> 0.003 | Seconds_per_step --> 2.022 | [2024-08-01 01:22:19,957][Main][INFO] - [train] Step 102900 out of 120000 | Loss --> 2.074 | Grad_l2 --> 0.370 | Weights_l2 --> 49642.264 | Lr --> 0.003 | Seconds_per_step --> 2.034 | [2024-08-01 01:25:42,255][Main][INFO] - [train] Step 103000 out of 120000 | Loss --> 2.078 | Grad_l2 --> 0.368 | Weights_l2 --> 49662.325 | Lr --> 0.003 | Seconds_per_step --> 2.023 | [2024-08-01 01:29:04,635][Main][INFO] - [train] Step 103100 out of 120000 | Loss --> 2.079 | Grad_l2 --> 0.370 | Weights_l2 --> 49681.870 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 01:32:26,587][Main][INFO] - [train] Step 103200 out of 120000 | Loss --> 2.076 | Grad_l2 --> 0.370 | Weights_l2 --> 49701.578 | Lr --> 0.003 | Seconds_per_step --> 2.020 | [2024-08-01 01:35:47,754][Main][INFO] - [train] Step 103300 out of 120000 | Loss --> 2.082 | Grad_l2 --> 0.373 | Weights_l2 --> 49720.866 | Lr --> 0.003 | Seconds_per_step --> 2.012 | [2024-08-01 01:39:10,123][Main][INFO] - [train] Step 103400 out of 120000 | Loss --> 2.073 | Grad_l2 --> 0.370 | Weights_l2 --> 49740.431 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 01:42:32,556][Main][INFO] - [train] Step 103500 out of 120000 | Loss --> 2.080 | Grad_l2 --> 0.367 | Weights_l2 --> 49760.067 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 01:45:55,554][Main][INFO] - [train] Step 103600 out of 120000 | Loss --> 2.080 | Grad_l2 --> 0.365 | Weights_l2 --> 49779.498 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-08-01 01:49:18,237][Main][INFO] - [train] Step 103700 out of 120000 | Loss --> 2.077 | Grad_l2 --> 0.367 | Weights_l2 --> 49798.785 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-08-01 01:52:43,170][Main][INFO] - [train] Step 103800 out of 120000 | Loss --> 2.059 | Grad_l2 --> 0.373 | Weights_l2 --> 49818.071 | Lr --> 0.003 | Seconds_per_step --> 2.049 | [2024-08-01 01:56:03,972][Main][INFO] - [train] Step 103900 out of 120000 | Loss --> 2.056 | Grad_l2 --> 0.369 | Weights_l2 --> 49837.512 | Lr --> 0.003 | Seconds_per_step --> 2.008 | [2024-08-01 01:59:28,152][Main][INFO] - [train] Step 104000 out of 120000 | Loss --> 2.049 | Grad_l2 --> 0.369 | Weights_l2 --> 49857.109 | Lr --> 0.003 | Seconds_per_step --> 2.042 | [2024-08-01 02:02:52,562][Main][INFO] - [train] Step 104100 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.366 | Weights_l2 --> 49876.710 | Lr --> 0.003 | Seconds_per_step --> 2.044 | [2024-08-01 02:06:16,053][Main][INFO] - [train] Step 104200 out of 120000 | Loss --> 2.050 | Grad_l2 --> 0.367 | Weights_l2 --> 49896.560 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-08-01 02:09:39,311][Main][INFO] - [train] Step 104300 out of 120000 | Loss --> 2.055 | Grad_l2 --> 0.381 | Weights_l2 --> 49916.419 | Lr --> 0.003 | Seconds_per_step --> 2.033 | [2024-08-01 02:13:04,042][Main][INFO] - [train] Step 104400 out of 120000 | Loss --> 2.047 | Grad_l2 --> 0.371 | Weights_l2 --> 49936.179 | Lr --> 0.003 | Seconds_per_step --> 2.047 | [2024-08-01 02:16:24,870][Main][INFO] - [train] Step 104500 out of 120000 | Loss --> 2.038 | Grad_l2 --> 0.369 | Weights_l2 --> 49956.152 | Lr --> 0.003 | Seconds_per_step --> 2.008 | [2024-08-01 02:19:49,955][Main][INFO] - [train] Step 104600 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.368 | Weights_l2 --> 49975.900 | Lr --> 0.003 | Seconds_per_step --> 2.051 | [2024-08-01 02:23:12,981][Main][INFO] - [train] Step 104700 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.368 | Weights_l2 --> 49995.819 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-08-01 02:26:35,938][Main][INFO] - [train] Step 104800 out of 120000 | Loss --> 2.039 | Grad_l2 --> 0.356 | Weights_l2 --> 50015.610 | Lr --> 0.003 | Seconds_per_step --> 2.030 | [2024-08-01 02:29:59,354][Main][INFO] - [train] Step 104900 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.366 | Weights_l2 --> 50035.065 | Lr --> 0.003 | Seconds_per_step --> 2.034 | [2024-08-01 02:33:21,854][Main][INFO] - [train] Step 105000 out of 120000 | Loss --> 2.037 | Grad_l2 --> 0.367 | Weights_l2 --> 50054.798 | Lr --> 0.003 | Seconds_per_step --> 2.025 | [2024-08-01 02:44:22,110][Main][INFO] - [eval] Step 105000 out of 120000 | Loss --> 2.083 | Accuracy --> 0.619 | Time --> 660.253 | [2024-08-01 02:47:46,892][Main][INFO] - [train] Step 105100 out of 120000 | Loss --> 2.031 | Grad_l2 --> 0.364 | Weights_l2 --> 50074.585 | Lr --> 0.003 | Seconds_per_step --> 2.048 | [2024-08-01 02:51:10,654][Main][INFO] - [train] Step 105200 out of 120000 | Loss --> 2.029 | Grad_l2 --> 0.368 | Weights_l2 --> 50094.432 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-08-01 02:54:33,755][Main][INFO] - [train] Step 105300 out of 120000 | Loss --> 2.026 | Grad_l2 --> 0.371 | Weights_l2 --> 50114.077 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-08-01 02:57:55,863][Main][INFO] - [train] Step 105400 out of 120000 | Loss --> 2.018 | Grad_l2 --> 0.369 | Weights_l2 --> 50133.582 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-08-01 03:01:19,643][Main][INFO] - [train] Step 105500 out of 120000 | Loss --> 2.031 | Grad_l2 --> 0.367 | Weights_l2 --> 50153.315 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-08-01 03:04:40,152][Main][INFO] - [train] Step 105600 out of 120000 | Loss --> 2.021 | Grad_l2 --> 0.368 | Weights_l2 --> 50172.433 | Lr --> 0.003 | Seconds_per_step --> 2.005 | [2024-08-01 03:08:02,559][Main][INFO] - [train] Step 105700 out of 120000 | Loss --> 2.012 | Grad_l2 --> 0.365 | Weights_l2 --> 50191.612 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 03:11:26,671][Main][INFO] - [train] Step 105800 out of 120000 | Loss --> 2.013 | Grad_l2 --> 0.361 | Weights_l2 --> 50211.020 | Lr --> 0.003 | Seconds_per_step --> 2.041 | [2024-08-01 03:14:48,531][Main][INFO] - [train] Step 105900 out of 120000 | Loss --> 2.006 | Grad_l2 --> 0.365 | Weights_l2 --> 50230.202 | Lr --> 0.003 | Seconds_per_step --> 2.019 | [2024-08-01 03:18:10,162][Main][INFO] - [train] Step 106000 out of 120000 | Loss --> 1.992 | Grad_l2 --> 0.366 | Weights_l2 --> 50249.131 | Lr --> 0.003 | Seconds_per_step --> 2.016 | [2024-08-01 03:21:34,246][Main][INFO] - [train] Step 106100 out of 120000 | Loss --> 1.974 | Grad_l2 --> 0.367 | Weights_l2 --> 50267.623 | Lr --> 0.003 | Seconds_per_step --> 2.041 | [2024-08-01 03:24:54,955][Main][INFO] - [train] Step 106200 out of 120000 | Loss --> 1.984 | Grad_l2 --> 0.367 | Weights_l2 --> 50286.091 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-08-01 03:28:17,315][Main][INFO] - [train] Step 106300 out of 120000 | Loss --> 1.971 | Grad_l2 --> 0.363 | Weights_l2 --> 50304.751 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 03:31:40,763][Main][INFO] - [train] Step 106400 out of 120000 | Loss --> 1.950 | Grad_l2 --> 0.364 | Weights_l2 --> 50322.697 | Lr --> 0.003 | Seconds_per_step --> 2.034 | [2024-08-01 03:35:01,647][Main][INFO] - [train] Step 106500 out of 120000 | Loss --> 1.957 | Grad_l2 --> 0.355 | Weights_l2 --> 50341.311 | Lr --> 0.003 | Seconds_per_step --> 2.009 | [2024-08-01 03:38:24,139][Main][INFO] - [train] Step 106600 out of 120000 | Loss --> 1.961 | Grad_l2 --> 0.355 | Weights_l2 --> 50359.294 | Lr --> 0.003 | Seconds_per_step --> 2.025 | [2024-08-01 03:41:46,823][Main][INFO] - [train] Step 106700 out of 120000 | Loss --> 1.955 | Grad_l2 --> 0.354 | Weights_l2 --> 50377.426 | Lr --> 0.003 | Seconds_per_step --> 2.027 | [2024-08-01 03:45:08,200][Main][INFO] - [train] Step 106800 out of 120000 | Loss --> 1.945 | Grad_l2 --> 0.359 | Weights_l2 --> 50395.405 | Lr --> 0.003 | Seconds_per_step --> 2.014 | [2024-08-01 03:48:31,992][Main][INFO] - [train] Step 106900 out of 120000 | Loss --> 1.956 | Grad_l2 --> 0.357 | Weights_l2 --> 50413.604 | Lr --> 0.003 | Seconds_per_step --> 2.038 | [2024-08-01 03:51:53,257][Main][INFO] - [train] Step 107000 out of 120000 | Loss --> 1.952 | Grad_l2 --> 0.359 | Weights_l2 --> 50431.889 | Lr --> 0.003 | Seconds_per_step --> 2.013 | [2024-08-01 03:55:14,470][Main][INFO] - [train] Step 107100 out of 120000 | Loss --> 1.953 | Grad_l2 --> 0.356 | Weights_l2 --> 50449.955 | Lr --> 0.003 | Seconds_per_step --> 2.012 | [2024-08-01 03:58:36,602][Main][INFO] - [train] Step 107200 out of 120000 | Loss --> 1.945 | Grad_l2 --> 0.353 | Weights_l2 --> 50467.885 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-08-01 04:01:58,953][Main][INFO] - [train] Step 107300 out of 120000 | Loss --> 1.955 | Grad_l2 --> 0.361 | Weights_l2 --> 50485.798 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 04:05:22,483][Main][INFO] - [train] Step 107400 out of 120000 | Loss --> 1.948 | Grad_l2 --> 0.362 | Weights_l2 --> 50504.490 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-08-01 04:08:45,974][Main][INFO] - [train] Step 107500 out of 120000 | Loss --> 1.961 | Grad_l2 --> 0.358 | Weights_l2 --> 50522.720 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-08-01 04:12:07,183][Main][INFO] - [train] Step 107600 out of 120000 | Loss --> 1.969 | Grad_l2 --> 0.363 | Weights_l2 --> 50540.727 | Lr --> 0.003 | Seconds_per_step --> 2.012 | [2024-08-01 04:15:30,843][Main][INFO] - [train] Step 107700 out of 120000 | Loss --> 1.965 | Grad_l2 --> 0.358 | Weights_l2 --> 50559.317 | Lr --> 0.003 | Seconds_per_step --> 2.037 | [2024-08-01 04:18:53,963][Main][INFO] - [train] Step 107800 out of 120000 | Loss --> 1.985 | Grad_l2 --> 0.358 | Weights_l2 --> 50577.909 | Lr --> 0.003 | Seconds_per_step --> 2.031 | [2024-08-01 04:22:14,970][Main][INFO] - [train] Step 107900 out of 120000 | Loss --> 1.972 | Grad_l2 --> 0.360 | Weights_l2 --> 50596.394 | Lr --> 0.003 | Seconds_per_step --> 2.010 | [2024-08-01 04:25:39,967][Main][INFO] - [train] Step 108000 out of 120000 | Loss --> 1.987 | Grad_l2 --> 0.368 | Weights_l2 --> 50614.519 | Lr --> 0.003 | Seconds_per_step --> 2.050 | [2024-08-01 04:29:00,668][Main][INFO] - [train] Step 108100 out of 120000 | Loss --> 1.980 | Grad_l2 --> 0.363 | Weights_l2 --> 50633.091 | Lr --> 0.003 | Seconds_per_step --> 2.007 | [2024-08-01 04:32:23,032][Main][INFO] - [train] Step 108200 out of 120000 | Loss --> 1.990 | Grad_l2 --> 0.364 | Weights_l2 --> 50651.385 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 04:35:48,785][Main][INFO] - [train] Step 108300 out of 120000 | Loss --> 1.974 | Grad_l2 --> 0.357 | Weights_l2 --> 50669.353 | Lr --> 0.003 | Seconds_per_step --> 2.058 | [2024-08-01 04:39:09,259][Main][INFO] - [train] Step 108400 out of 120000 | Loss --> 1.990 | Grad_l2 --> 0.363 | Weights_l2 --> 50686.829 | Lr --> 0.003 | Seconds_per_step --> 2.005 | [2024-08-01 04:42:33,855][Main][INFO] - [train] Step 108500 out of 120000 | Loss --> 1.978 | Grad_l2 --> 0.359 | Weights_l2 --> 50704.567 | Lr --> 0.003 | Seconds_per_step --> 2.046 | [2024-08-01 04:45:57,460][Main][INFO] - [train] Step 108600 out of 120000 | Loss --> 1.988 | Grad_l2 --> 0.360 | Weights_l2 --> 50722.114 | Lr --> 0.003 | Seconds_per_step --> 2.036 | [2024-08-01 04:49:19,569][Main][INFO] - [train] Step 108700 out of 120000 | Loss --> 1.976 | Grad_l2 --> 0.360 | Weights_l2 --> 50739.040 | Lr --> 0.003 | Seconds_per_step --> 2.021 | [2024-08-01 04:52:39,434][Main][INFO] - [train] Step 108800 out of 120000 | Loss --> 1.980 | Grad_l2 --> 0.366 | Weights_l2 --> 50755.594 | Lr --> 0.003 | Seconds_per_step --> 1.999 | [2024-08-01 04:56:06,594][Main][INFO] - [train] Step 108900 out of 120000 | Loss --> 1.979 | Grad_l2 --> 0.368 | Weights_l2 --> 50771.839 | Lr --> 0.003 | Seconds_per_step --> 2.072 | [2024-08-01 04:59:28,269][Main][INFO] - [train] Step 109000 out of 120000 | Loss --> 1.989 | Grad_l2 --> 0.360 | Weights_l2 --> 50787.929 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-08-01 05:02:49,608][Main][INFO] - [train] Step 109100 out of 120000 | Loss --> 1.981 | Grad_l2 --> 0.361 | Weights_l2 --> 50803.393 | Lr --> 0.003 | Seconds_per_step --> 2.013 | [2024-08-01 05:06:14,790][Main][INFO] - [train] Step 109200 out of 120000 | Loss --> 1.971 | Grad_l2 --> 0.361 | Weights_l2 --> 50818.803 | Lr --> 0.003 | Seconds_per_step --> 2.052 | [2024-08-01 05:09:37,692][Main][INFO] - [train] Step 109300 out of 120000 | Loss --> 1.974 | Grad_l2 --> 0.362 | Weights_l2 --> 50833.562 | Lr --> 0.003 | Seconds_per_step --> 2.029 | [2024-08-01 05:13:00,092][Main][INFO] - [train] Step 109400 out of 120000 | Loss --> 1.973 | Grad_l2 --> 0.362 | Weights_l2 --> 50848.071 | Lr --> 0.003 | Seconds_per_step --> 2.024 | [2024-08-01 05:16:22,652][Main][INFO] - [train] Step 109500 out of 120000 | Loss --> 1.974 | Grad_l2 --> 0.358 | Weights_l2 --> 50862.342 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-08-01 05:19:42,374][Main][INFO] - [train] Step 109600 out of 120000 | Loss --> 1.977 | Grad_l2 --> 0.368 | Weights_l2 --> 50876.542 | Lr --> 0.003 | Seconds_per_step --> 1.997 | [2024-08-01 05:23:05,011][Main][INFO] - [train] Step 109700 out of 120000 | Loss --> 1.975 | Grad_l2 --> 0.363 | Weights_l2 --> 50890.018 | Lr --> 0.003 | Seconds_per_step --> 2.026 | [2024-08-01 05:26:28,559][Main][INFO] - [train] Step 109800 out of 120000 | Loss --> 1.969 | Grad_l2 --> 0.359 | Weights_l2 --> 50903.553 | Lr --> 0.003 | Seconds_per_step --> 2.035 | [2024-08-01 05:29:51,053][Main][INFO] - [train] Step 109900 out of 120000 | Loss --> 1.973 | Grad_l2 --> 0.364 | Weights_l2 --> 50917.126 | Lr --> 0.003 | Seconds_per_step --> 2.025 | [2024-08-01 05:33:12,789][Main][INFO] - [train] Step 110000 out of 120000 | Loss --> 1.967 | Grad_l2 --> 0.362 | Weights_l2 --> 50930.392 | Lr --> 0.003 | Seconds_per_step --> 2.017 | [2024-08-01 05:44:08,608][Main][INFO] - [eval] Step 110000 out of 120000 | Loss --> 2.061 | Accuracy --> 0.621 | Time --> 655.817 | [2024-08-01 05:44:08,612][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-110000 [2024-08-01 05:44:08,615][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-01 05:44:11,970][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-110000/model.safetensors [2024-08-01 05:44:12,022][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-110000/optimizer.bin [2024-08-01 05:44:12,023][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-110000/scheduler.bin [2024-08-01 05:44:12,023][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-110000/sampler.bin [2024-08-01 05:44:12,023][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-110000/sampler_1.bin [2024-08-01 05:44:12,024][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-110000/random_states_0.pkl [2024-08-01 05:47:34,190][Main][INFO] - [train] Step 110100 out of 120000 | Loss --> 1.974 | Grad_l2 --> 0.367 | Weights_l2 --> 50943.310 | Lr --> 0.003 | Seconds_per_step --> 2.056 | [2024-08-01 05:50:56,352][Main][INFO] - [train] Step 110200 out of 120000 | Loss --> 1.968 | Grad_l2 --> 0.355 | Weights_l2 --> 50955.369 | Lr --> 0.002 | Seconds_per_step --> 2.022 | [2024-08-01 05:54:19,037][Main][INFO] - [train] Step 110300 out of 120000 | Loss --> 1.988 | Grad_l2 --> 0.357 | Weights_l2 --> 50967.746 | Lr --> 0.002 | Seconds_per_step --> 2.027 | [2024-08-01 05:57:41,056][Main][INFO] - [train] Step 110400 out of 120000 | Loss --> 1.990 | Grad_l2 --> 0.367 | Weights_l2 --> 50980.001 | Lr --> 0.002 | Seconds_per_step --> 2.020 | [2024-08-01 06:01:04,501][Main][INFO] - [train] Step 110500 out of 120000 | Loss --> 1.999 | Grad_l2 --> 0.357 | Weights_l2 --> 50991.567 | Lr --> 0.002 | Seconds_per_step --> 2.034 | [2024-08-01 06:04:26,941][Main][INFO] - [train] Step 110600 out of 120000 | Loss --> 2.007 | Grad_l2 --> 0.365 | Weights_l2 --> 51003.275 | Lr --> 0.002 | Seconds_per_step --> 2.024 | [2024-08-01 06:07:49,798][Main][INFO] - [train] Step 110700 out of 120000 | Loss --> 2.005 | Grad_l2 --> 0.364 | Weights_l2 --> 51014.556 | Lr --> 0.002 | Seconds_per_step --> 2.029 | [2024-08-01 06:11:13,158][Main][INFO] - [train] Step 110800 out of 120000 | Loss --> 2.015 | Grad_l2 --> 0.361 | Weights_l2 --> 51025.584 | Lr --> 0.002 | Seconds_per_step --> 2.034 | [2024-08-01 06:14:37,283][Main][INFO] - [train] Step 110900 out of 120000 | Loss --> 2.025 | Grad_l2 --> 0.354 | Weights_l2 --> 51036.414 | Lr --> 0.002 | Seconds_per_step --> 2.041 | [2024-08-01 06:17:59,976][Main][INFO] - [train] Step 111000 out of 120000 | Loss --> 2.033 | Grad_l2 --> 0.357 | Weights_l2 --> 51046.891 | Lr --> 0.002 | Seconds_per_step --> 2.027 | [2024-08-01 06:21:20,221][Main][INFO] - [train] Step 111100 out of 120000 | Loss --> 2.039 | Grad_l2 --> 0.367 | Weights_l2 --> 51057.070 | Lr --> 0.002 | Seconds_per_step --> 2.002 | [2024-08-01 06:24:41,505][Main][INFO] - [train] Step 111200 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.364 | Weights_l2 --> 51067.155 | Lr --> 0.002 | Seconds_per_step --> 2.013 | [2024-08-01 06:28:06,435][Main][INFO] - [train] Step 111300 out of 120000 | Loss --> 2.046 | Grad_l2 --> 0.363 | Weights_l2 --> 51076.826 | Lr --> 0.002 | Seconds_per_step --> 2.049 | [2024-08-01 06:31:27,593][Main][INFO] - [train] Step 111400 out of 120000 | Loss --> 2.055 | Grad_l2 --> 0.368 | Weights_l2 --> 51086.332 | Lr --> 0.002 | Seconds_per_step --> 2.012 | [2024-08-01 06:34:50,641][Main][INFO] - [train] Step 111500 out of 120000 | Loss --> 2.062 | Grad_l2 --> 0.361 | Weights_l2 --> 51095.528 | Lr --> 0.002 | Seconds_per_step --> 2.030 | [2024-08-01 06:38:13,581][Main][INFO] - [train] Step 111600 out of 120000 | Loss --> 2.070 | Grad_l2 --> 0.363 | Weights_l2 --> 51104.531 | Lr --> 0.002 | Seconds_per_step --> 2.029 | [2024-08-01 06:41:34,670][Main][INFO] - [train] Step 111700 out of 120000 | Loss --> 2.044 | Grad_l2 --> 0.363 | Weights_l2 --> 51113.378 | Lr --> 0.002 | Seconds_per_step --> 2.011 | [2024-08-01 06:44:56,490][Main][INFO] - [train] Step 111800 out of 120000 | Loss --> 2.049 | Grad_l2 --> 0.358 | Weights_l2 --> 51121.790 | Lr --> 0.002 | Seconds_per_step --> 2.018 | [2024-08-01 06:48:18,757][Main][INFO] - [train] Step 111900 out of 120000 | Loss --> 2.051 | Grad_l2 --> 0.360 | Weights_l2 --> 51129.648 | Lr --> 0.002 | Seconds_per_step --> 2.023 | [2024-08-01 06:51:38,782][Main][INFO] - [train] Step 112000 out of 120000 | Loss --> 2.040 | Grad_l2 --> 0.363 | Weights_l2 --> 51137.563 | Lr --> 0.002 | Seconds_per_step --> 2.000 | [2024-08-01 06:54:59,990][Main][INFO] - [train] Step 112100 out of 120000 | Loss --> 2.048 | Grad_l2 --> 0.350 | Weights_l2 --> 51145.417 | Lr --> 0.002 | Seconds_per_step --> 2.012 | [2024-08-01 06:58:24,751][Main][INFO] - [train] Step 112200 out of 120000 | Loss --> 2.057 | Grad_l2 --> 0.359 | Weights_l2 --> 51153.079 | Lr --> 0.002 | Seconds_per_step --> 2.048 | [2024-08-01 07:01:45,233][Main][INFO] - [train] Step 112300 out of 120000 | Loss --> 2.065 | Grad_l2 --> 0.361 | Weights_l2 --> 51160.501 | Lr --> 0.002 | Seconds_per_step --> 2.005 | [2024-08-01 07:05:07,138][Main][INFO] - [train] Step 112400 out of 120000 | Loss --> 2.056 | Grad_l2 --> 0.358 | Weights_l2 --> 51167.436 | Lr --> 0.002 | Seconds_per_step --> 2.019 | [2024-08-01 07:08:30,581][Main][INFO] - [train] Step 112500 out of 120000 | Loss --> 2.058 | Grad_l2 --> 0.353 | Weights_l2 --> 51174.130 | Lr --> 0.002 | Seconds_per_step --> 2.034 | [2024-08-01 07:11:50,556][Main][INFO] - [train] Step 112600 out of 120000 | Loss --> 2.045 | Grad_l2 --> 0.362 | Weights_l2 --> 51180.745 | Lr --> 0.002 | Seconds_per_step --> 2.000 | [2024-08-01 07:15:13,190][Main][INFO] - [train] Step 112700 out of 120000 | Loss --> 2.030 | Grad_l2 --> 0.348 | Weights_l2 --> 51187.327 | Lr --> 0.002 | Seconds_per_step --> 2.026 | [2024-08-01 07:18:36,351][Main][INFO] - [train] Step 112800 out of 120000 | Loss --> 2.005 | Grad_l2 --> 0.352 | Weights_l2 --> 51193.730 | Lr --> 0.002 | Seconds_per_step --> 2.032 | [2024-08-01 07:21:57,986][Main][INFO] - [train] Step 112900 out of 120000 | Loss --> 1.999 | Grad_l2 --> 0.357 | Weights_l2 --> 51199.851 | Lr --> 0.002 | Seconds_per_step --> 2.016 | [2024-08-01 07:25:21,255][Main][INFO] - [train] Step 113000 out of 120000 | Loss --> 1.991 | Grad_l2 --> 0.355 | Weights_l2 --> 51205.725 | Lr --> 0.002 | Seconds_per_step --> 2.033 | [2024-08-01 07:28:42,781][Main][INFO] - [train] Step 113100 out of 120000 | Loss --> 1.982 | Grad_l2 --> 0.353 | Weights_l2 --> 51211.452 | Lr --> 0.002 | Seconds_per_step --> 2.015 | [2024-08-01 07:32:04,572][Main][INFO] - [train] Step 113200 out of 120000 | Loss --> 1.965 | Grad_l2 --> 0.350 | Weights_l2 --> 51217.003 | Lr --> 0.002 | Seconds_per_step --> 2.018 | [2024-08-01 07:35:25,078][Main][INFO] - [train] Step 113300 out of 120000 | Loss --> 1.956 | Grad_l2 --> 0.346 | Weights_l2 --> 51222.400 | Lr --> 0.002 | Seconds_per_step --> 2.005 | [2024-08-01 07:38:45,287][Main][INFO] - [train] Step 113400 out of 120000 | Loss --> 1.943 | Grad_l2 --> 0.350 | Weights_l2 --> 51227.485 | Lr --> 0.002 | Seconds_per_step --> 2.002 | [2024-08-01 07:42:08,095][Main][INFO] - [train] Step 113500 out of 120000 | Loss --> 1.934 | Grad_l2 --> 0.353 | Weights_l2 --> 51232.387 | Lr --> 0.002 | Seconds_per_step --> 2.028 | [2024-08-01 07:45:27,453][Main][INFO] - [train] Step 113600 out of 120000 | Loss --> 1.929 | Grad_l2 --> 0.350 | Weights_l2 --> 51237.124 | Lr --> 0.002 | Seconds_per_step --> 1.994 | [2024-08-01 07:48:51,905][Main][INFO] - [train] Step 113700 out of 120000 | Loss --> 1.917 | Grad_l2 --> 0.350 | Weights_l2 --> 51241.661 | Lr --> 0.002 | Seconds_per_step --> 2.045 | [2024-08-01 07:52:13,788][Main][INFO] - [train] Step 113800 out of 120000 | Loss --> 1.920 | Grad_l2 --> 0.348 | Weights_l2 --> 51245.999 | Lr --> 0.002 | Seconds_per_step --> 2.019 | [2024-08-01 07:55:36,956][Main][INFO] - [train] Step 113900 out of 120000 | Loss --> 1.910 | Grad_l2 --> 0.354 | Weights_l2 --> 51250.254 | Lr --> 0.002 | Seconds_per_step --> 2.032 | [2024-08-01 07:58:58,568][Main][INFO] - [train] Step 114000 out of 120000 | Loss --> 1.914 | Grad_l2 --> 0.347 | Weights_l2 --> 51254.366 | Lr --> 0.002 | Seconds_per_step --> 2.016 | [2024-08-01 08:02:21,270][Main][INFO] - [train] Step 114100 out of 120000 | Loss --> 1.929 | Grad_l2 --> 0.344 | Weights_l2 --> 51258.301 | Lr --> 0.001 | Seconds_per_step --> 2.027 | [2024-08-01 08:05:45,158][Main][INFO] - [train] Step 114200 out of 120000 | Loss --> 1.931 | Grad_l2 --> 0.351 | Weights_l2 --> 51262.225 | Lr --> 0.001 | Seconds_per_step --> 2.039 | [2024-08-01 08:09:09,581][Main][INFO] - [train] Step 114300 out of 120000 | Loss --> 1.948 | Grad_l2 --> 0.354 | Weights_l2 --> 51265.907 | Lr --> 0.001 | Seconds_per_step --> 2.044 | [2024-08-01 08:12:29,773][Main][INFO] - [train] Step 114400 out of 120000 | Loss --> 1.939 | Grad_l2 --> 0.347 | Weights_l2 --> 51269.662 | Lr --> 0.001 | Seconds_per_step --> 2.002 | [2024-08-01 08:15:54,773][Main][INFO] - [train] Step 114500 out of 120000 | Loss --> 1.939 | Grad_l2 --> 0.353 | Weights_l2 --> 51273.274 | Lr --> 0.001 | Seconds_per_step --> 2.050 | [2024-08-01 08:19:17,855][Main][INFO] - [train] Step 114600 out of 120000 | Loss --> 1.954 | Grad_l2 --> 0.352 | Weights_l2 --> 51276.401 | Lr --> 0.001 | Seconds_per_step --> 2.031 | [2024-08-01 08:22:37,953][Main][INFO] - [train] Step 114700 out of 120000 | Loss --> 1.958 | Grad_l2 --> 0.343 | Weights_l2 --> 51279.533 | Lr --> 0.001 | Seconds_per_step --> 2.001 | [2024-08-01 08:26:01,656][Main][INFO] - [train] Step 114800 out of 120000 | Loss --> 1.947 | Grad_l2 --> 0.347 | Weights_l2 --> 51282.557 | Lr --> 0.001 | Seconds_per_step --> 2.037 | [2024-08-01 08:29:25,970][Main][INFO] - [train] Step 114900 out of 120000 | Loss --> 1.947 | Grad_l2 --> 0.343 | Weights_l2 --> 51285.436 | Lr --> 0.001 | Seconds_per_step --> 2.043 | [2024-08-01 08:32:49,753][Main][INFO] - [train] Step 115000 out of 120000 | Loss --> 1.970 | Grad_l2 --> 0.355 | Weights_l2 --> 51288.131 | Lr --> 0.001 | Seconds_per_step --> 2.038 | [2024-08-01 08:43:50,322][Main][INFO] - [eval] Step 115000 out of 120000 | Loss --> 2.010 | Accuracy --> 0.628 | Time --> 660.567 | [2024-08-01 08:47:14,138][Main][INFO] - [train] Step 115100 out of 120000 | Loss --> 1.932 | Grad_l2 --> 0.341 | Weights_l2 --> 51290.914 | Lr --> 0.001 | Seconds_per_step --> 2.038 | [2024-08-01 08:50:38,555][Main][INFO] - [train] Step 115200 out of 120000 | Loss --> 1.949 | Grad_l2 --> 0.353 | Weights_l2 --> 51293.441 | Lr --> 0.001 | Seconds_per_step --> 2.044 | [2024-08-01 08:54:04,591][Main][INFO] - [train] Step 115300 out of 120000 | Loss --> 1.945 | Grad_l2 --> 0.347 | Weights_l2 --> 51295.929 | Lr --> 0.001 | Seconds_per_step --> 2.060 | [2024-08-01 08:57:25,333][Main][INFO] - [train] Step 115400 out of 120000 | Loss --> 1.947 | Grad_l2 --> 0.352 | Weights_l2 --> 51298.388 | Lr --> 0.001 | Seconds_per_step --> 2.007 | [2024-08-01 09:00:46,281][Main][INFO] - [train] Step 115500 out of 120000 | Loss --> 1.961 | Grad_l2 --> 0.344 | Weights_l2 --> 51300.732 | Lr --> 0.001 | Seconds_per_step --> 2.009 | [2024-08-01 09:04:08,439][Main][INFO] - [train] Step 115600 out of 120000 | Loss --> 1.945 | Grad_l2 --> 0.348 | Weights_l2 --> 51302.803 | Lr --> 0.001 | Seconds_per_step --> 2.022 | [2024-08-01 09:07:34,058][Main][INFO] - [train] Step 115700 out of 120000 | Loss --> 1.952 | Grad_l2 --> 0.345 | Weights_l2 --> 51304.794 | Lr --> 0.001 | Seconds_per_step --> 2.056 | [2024-08-01 09:10:53,179][Main][INFO] - [train] Step 115800 out of 120000 | Loss --> 1.947 | Grad_l2 --> 0.351 | Weights_l2 --> 51306.677 | Lr --> 0.001 | Seconds_per_step --> 1.991 | [2024-08-01 09:14:18,165][Main][INFO] - [train] Step 115900 out of 120000 | Loss --> 1.938 | Grad_l2 --> 0.346 | Weights_l2 --> 51308.638 | Lr --> 0.001 | Seconds_per_step --> 2.050 | [2024-08-01 09:17:40,355][Main][INFO] - [train] Step 116000 out of 120000 | Loss --> 1.945 | Grad_l2 --> 0.348 | Weights_l2 --> 51310.246 | Lr --> 0.001 | Seconds_per_step --> 2.022 | [2024-08-01 09:21:01,682][Main][INFO] - [train] Step 116100 out of 120000 | Loss --> 1.925 | Grad_l2 --> 0.344 | Weights_l2 --> 51311.829 | Lr --> 0.001 | Seconds_per_step --> 2.013 | [2024-08-01 09:24:24,235][Main][INFO] - [train] Step 116200 out of 120000 | Loss --> 1.909 | Grad_l2 --> 0.345 | Weights_l2 --> 51313.170 | Lr --> 0.001 | Seconds_per_step --> 2.026 | [2024-08-01 09:27:46,848][Main][INFO] - [train] Step 116300 out of 120000 | Loss --> 1.901 | Grad_l2 --> 0.342 | Weights_l2 --> 51314.432 | Lr --> 0.001 | Seconds_per_step --> 2.026 | [2024-08-01 09:31:07,382][Main][INFO] - [train] Step 116400 out of 120000 | Loss --> 1.909 | Grad_l2 --> 0.341 | Weights_l2 --> 51315.581 | Lr --> 0.001 | Seconds_per_step --> 2.005 | [2024-08-01 09:34:32,470][Main][INFO] - [train] Step 116500 out of 120000 | Loss --> 1.907 | Grad_l2 --> 0.338 | Weights_l2 --> 51316.762 | Lr --> 0.001 | Seconds_per_step --> 2.051 | [2024-08-01 09:37:53,967][Main][INFO] - [train] Step 116600 out of 120000 | Loss --> 1.911 | Grad_l2 --> 0.342 | Weights_l2 --> 51317.801 | Lr --> 0.001 | Seconds_per_step --> 2.015 | [2024-08-01 09:41:14,990][Main][INFO] - [train] Step 116700 out of 120000 | Loss --> 1.898 | Grad_l2 --> 0.348 | Weights_l2 --> 51318.825 | Lr --> 0.001 | Seconds_per_step --> 2.010 | [2024-08-01 09:44:36,990][Main][INFO] - [train] Step 116800 out of 120000 | Loss --> 1.903 | Grad_l2 --> 0.346 | Weights_l2 --> 51319.730 | Lr --> 0.001 | Seconds_per_step --> 2.020 | [2024-08-01 09:47:58,971][Main][INFO] - [train] Step 116900 out of 120000 | Loss --> 1.908 | Grad_l2 --> 0.346 | Weights_l2 --> 51320.603 | Lr --> 0.001 | Seconds_per_step --> 2.020 | [2024-08-01 09:51:20,950][Main][INFO] - [train] Step 117000 out of 120000 | Loss --> 1.887 | Grad_l2 --> 0.344 | Weights_l2 --> 51321.424 | Lr --> 0.001 | Seconds_per_step --> 2.020 | [2024-08-01 09:54:44,061][Main][INFO] - [train] Step 117100 out of 120000 | Loss --> 1.907 | Grad_l2 --> 0.338 | Weights_l2 --> 51322.066 | Lr --> 0.001 | Seconds_per_step --> 2.031 | [2024-08-01 09:58:05,143][Main][INFO] - [train] Step 117200 out of 120000 | Loss --> 1.917 | Grad_l2 --> 0.342 | Weights_l2 --> 51322.679 | Lr --> 0.001 | Seconds_per_step --> 2.011 | [2024-08-01 10:01:26,593][Main][INFO] - [train] Step 117300 out of 120000 | Loss --> 1.910 | Grad_l2 --> 0.338 | Weights_l2 --> 51323.215 | Lr --> 0.001 | Seconds_per_step --> 2.014 | [2024-08-01 10:04:50,761][Main][INFO] - [train] Step 117400 out of 120000 | Loss --> 1.892 | Grad_l2 --> 0.338 | Weights_l2 --> 51323.658 | Lr --> 0.001 | Seconds_per_step --> 2.042 | [2024-08-01 10:08:11,537][Main][INFO] - [train] Step 117500 out of 120000 | Loss --> 1.905 | Grad_l2 --> 0.345 | Weights_l2 --> 51324.070 | Lr --> 0.001 | Seconds_per_step --> 2.008 | [2024-08-01 10:11:35,634][Main][INFO] - [train] Step 117600 out of 120000 | Loss --> 1.902 | Grad_l2 --> 0.340 | Weights_l2 --> 51324.443 | Lr --> 0.001 | Seconds_per_step --> 2.041 | [2024-08-01 10:15:02,544][Main][INFO] - [train] Step 117700 out of 120000 | Loss --> 1.914 | Grad_l2 --> 0.339 | Weights_l2 --> 51324.668 | Lr --> 0.001 | Seconds_per_step --> 2.069 | [2024-08-01 10:18:23,185][Main][INFO] - [train] Step 117800 out of 120000 | Loss --> 1.921 | Grad_l2 --> 0.340 | Weights_l2 --> 51324.964 | Lr --> 0.001 | Seconds_per_step --> 2.006 | [2024-08-01 10:21:47,722][Main][INFO] - [train] Step 117900 out of 120000 | Loss --> 1.921 | Grad_l2 --> 0.345 | Weights_l2 --> 51325.233 | Lr --> 0.001 | Seconds_per_step --> 2.045 | [2024-08-01 10:25:08,344][Main][INFO] - [train] Step 118000 out of 120000 | Loss --> 1.922 | Grad_l2 --> 0.346 | Weights_l2 --> 51325.418 | Lr --> 0.001 | Seconds_per_step --> 2.006 | [2024-08-01 10:28:28,858][Main][INFO] - [train] Step 118100 out of 120000 | Loss --> 1.935 | Grad_l2 --> 0.347 | Weights_l2 --> 51325.538 | Lr --> 0.000 | Seconds_per_step --> 2.005 | [2024-08-01 10:31:54,552][Main][INFO] - [train] Step 118200 out of 120000 | Loss --> 1.938 | Grad_l2 --> 0.348 | Weights_l2 --> 51325.687 | Lr --> 0.000 | Seconds_per_step --> 2.057 | [2024-08-01 10:35:15,068][Main][INFO] - [train] Step 118300 out of 120000 | Loss --> 1.955 | Grad_l2 --> 0.340 | Weights_l2 --> 51325.754 | Lr --> 0.000 | Seconds_per_step --> 2.005 | [2024-08-01 10:38:38,952][Main][INFO] - [train] Step 118400 out of 120000 | Loss --> 1.951 | Grad_l2 --> 0.351 | Weights_l2 --> 51325.834 | Lr --> 0.000 | Seconds_per_step --> 2.039 | [2024-08-01 10:42:00,439][Main][INFO] - [train] Step 118500 out of 120000 | Loss --> 1.938 | Grad_l2 --> 0.345 | Weights_l2 --> 51325.833 | Lr --> 0.000 | Seconds_per_step --> 2.015 | [2024-08-01 10:45:21,051][Main][INFO] - [train] Step 118600 out of 120000 | Loss --> 1.958 | Grad_l2 --> 0.340 | Weights_l2 --> 51325.803 | Lr --> 0.000 | Seconds_per_step --> 2.006 | [2024-08-01 10:48:44,946][Main][INFO] - [train] Step 118700 out of 120000 | Loss --> 1.951 | Grad_l2 --> 0.338 | Weights_l2 --> 51325.786 | Lr --> 0.000 | Seconds_per_step --> 2.039 | [2024-08-01 10:52:08,807][Main][INFO] - [train] Step 118800 out of 120000 | Loss --> 1.954 | Grad_l2 --> 0.343 | Weights_l2 --> 51325.765 | Lr --> 0.000 | Seconds_per_step --> 2.039 | [2024-08-01 10:55:28,500][Main][INFO] - [train] Step 118900 out of 120000 | Loss --> 1.952 | Grad_l2 --> 0.338 | Weights_l2 --> 51325.733 | Lr --> 0.000 | Seconds_per_step --> 1.997 | [2024-08-01 10:58:51,967][Main][INFO] - [train] Step 119000 out of 120000 | Loss --> 1.945 | Grad_l2 --> 0.342 | Weights_l2 --> 51325.686 | Lr --> 0.000 | Seconds_per_step --> 2.035 | [2024-08-01 11:02:14,636][Main][INFO] - [train] Step 119100 out of 120000 | Loss --> 1.941 | Grad_l2 --> 0.343 | Weights_l2 --> 51325.607 | Lr --> 0.000 | Seconds_per_step --> 2.027 | [2024-08-01 11:05:35,182][Main][INFO] - [train] Step 119200 out of 120000 | Loss --> 1.934 | Grad_l2 --> 0.336 | Weights_l2 --> 51325.503 | Lr --> 0.000 | Seconds_per_step --> 2.005 | [2024-08-01 11:08:57,124][Main][INFO] - [train] Step 119300 out of 120000 | Loss --> 1.930 | Grad_l2 --> 0.341 | Weights_l2 --> 51325.393 | Lr --> 0.000 | Seconds_per_step --> 2.019 | [2024-08-01 11:12:19,367][Main][INFO] - [train] Step 119400 out of 120000 | Loss --> 1.926 | Grad_l2 --> 0.339 | Weights_l2 --> 51325.312 | Lr --> 0.000 | Seconds_per_step --> 2.022 | [2024-08-01 11:15:40,887][Main][INFO] - [train] Step 119500 out of 120000 | Loss --> 1.919 | Grad_l2 --> 0.337 | Weights_l2 --> 51325.240 | Lr --> 0.000 | Seconds_per_step --> 2.015 | [2024-08-01 11:19:04,345][Main][INFO] - [train] Step 119600 out of 120000 | Loss --> 1.925 | Grad_l2 --> 0.338 | Weights_l2 --> 51325.167 | Lr --> 0.000 | Seconds_per_step --> 2.035 | [2024-08-01 11:22:26,236][Main][INFO] - [train] Step 119700 out of 120000 | Loss --> 1.941 | Grad_l2 --> 0.343 | Weights_l2 --> 51325.105 | Lr --> 0.000 | Seconds_per_step --> 2.019 | [2024-08-01 11:25:48,782][Main][INFO] - [train] Step 119800 out of 120000 | Loss --> 1.923 | Grad_l2 --> 0.339 | Weights_l2 --> 51325.052 | Lr --> 0.000 | Seconds_per_step --> 2.025 | [2024-08-01 11:29:10,954][Main][INFO] - [train] Step 119900 out of 120000 | Loss --> 1.927 | Grad_l2 --> 0.339 | Weights_l2 --> 51325.021 | Lr --> 0.000 | Seconds_per_step --> 2.022 | [2024-08-01 11:32:31,939][Main][INFO] - [train] Step 120000 out of 120000 | Loss --> 1.924 | Grad_l2 --> 0.346 | Weights_l2 --> 51325.008 | Lr --> 0.000 | Seconds_per_step --> 2.010 | [2024-08-01 11:43:36,531][Main][INFO] - [eval] Step 120000 out of 120000 | Loss --> 1.974 | Accuracy --> 0.634 | Time --> 664.591 | [2024-08-01 11:43:36,535][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-120000 [2024-08-01 11:43:36,538][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-01 11:43:39,834][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-120000/model.safetensors [2024-08-01 11:43:39,888][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-120000/optimizer.bin [2024-08-01 11:43:39,890][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-120000/scheduler.bin [2024-08-01 11:43:39,890][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-120000/sampler.bin [2024-08-01 11:43:39,891][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-120000/sampler_1.bin [2024-08-01 11:43:39,893][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-120000/random_states_0.pkl [2024-08-01 11:54:47,092][Main][INFO] - [eval] Step 120001 out of 120000 | Loss --> 1.972 | Accuracy --> 0.634 | Time --> 666.103 | [2024-08-01 11:54:47,095][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-120001 [2024-08-01 11:54:47,098][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-01 11:54:50,345][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-120001/model.safetensors [2024-08-01 11:54:50,396][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-120001/optimizer.bin [2024-08-01 11:54:50,397][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-120001/scheduler.bin [2024-08-01 11:54:50,397][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-120001/sampler.bin [2024-08-01 11:54:50,397][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-120001/sampler_1.bin [2024-08-01 11:54:50,399][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-120001/random_states_0.pkl