0%| | 0/509 [00:00> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:05,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:08,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7829, 'learning_rate': 0.0, 'epoch': 0.0} [WARNING|modeling_utils.py:388] 2022-02-28 23:14:11,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 1/509 [00:12<1:47:55, 12.75s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:14:14,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:17,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:19,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.9181, 'learning_rate': 2e-08, 'epoch': 0.0} [WARNING|modeling_utils.py:388] 2022-02-28 23:14:22,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 2/509 [00:24<1:43:57, 12.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:14:26,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:29,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:32,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:34,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 3/509 [00:36<1:42:03, 12.10s/it] 1%|▍ | 3/509 [00:36<1:42:03, 12.10s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:14:37,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:40,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:43,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.808, 'learning_rate': 6.000000000000001e-08, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 23:14:46,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 4/509 [00:48<1:40:12, 11.91s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:14:49,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:52,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:14:55,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7888, 'learning_rate': 8e-08, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 23:14:58,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 5/509 [00:59<1:38:52, 11.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:15:01,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:03,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:09,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▉ | 6/509 [01:11<1:37:51, 11.67s/it] 1%|▉ | 6/509 [01:11<1:37:51, 11.67s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:15:12,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:15,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:18,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8279, 'learning_rate': 1.2000000000000002e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 23:15:20,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|█ | 7/509 [01:22<1:36:42, 11.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:15:23,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:26,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:29,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:32,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▎ | 8/509 [01:33<1:35:35, 11.45s/it] 2%|█▎ | 8/509 [01:33<1:35:35, 11.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:15:34,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:37,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:40,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8801, 'learning_rate': 1.6e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 23:15:43,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▍ | 9/509 [01:45<1:35:01, 11.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:15:46,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:49,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:51,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:15:54,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▌ | 10/509 [01:56<1:34:25, 11.35s/it] 2%|█▌ | 10/509 [01:56<1:34:25, 11.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:15:57,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:00,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:02,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:05,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8581, 'learning_rate': 2.0000000000000002e-07, 'epoch': 0.02} 2%|█▋ | 11/509 [02:07<1:33:22, 11.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:16:08,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:11,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:13,924 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:16,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▉ | 12/509 [02:18<1:32:26, 11.16s/it] 2%|█▉ | 12/509 [02:18<1:32:26, 11.16s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:16:19,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:22,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:24,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8535, 'learning_rate': 2.4000000000000003e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 23:16:27,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██ | 13/509 [02:29<1:31:38, 11.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:16:30,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:33,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:35,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7933, 'learning_rate': 2.6e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 23:16:38,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▏ | 14/509 [02:40<1:30:47, 11.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:16:41,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:43,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:46,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7005, 'learning_rate': 2.8e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 23:16:49,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▎ | 15/509 [02:50<1:30:20, 10.97s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:16:51,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:54,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:16:57,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8842, 'learning_rate': 3.0000000000000004e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 23:16:59,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▌ | 16/509 [03:01<1:29:20, 10.87s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:17:02,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:05,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:07,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8656, 'learning_rate': 3.2e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 23:17:10,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▋ | 17/509 [03:12<1:28:41, 10.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:17:13,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:15,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:18,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.925, 'learning_rate': 3.4000000000000003e-07, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-02-28 23:17:21,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|██▊ | 18/509 [03:22<1:27:54, 10.74s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:17:23,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:26,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:29,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:31,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|██▉ | 19/509 [03:33<1:27:23, 10.70s/it] 4%|██▉ | 19/509 [03:33<1:27:23, 10.70s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:17:34,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:37,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:39,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.9827, 'learning_rate': 3.8e-07, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-02-28 23:17:42,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▏ | 20/509 [03:43<1:26:37, 10.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:17:44,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:47,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:49,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:52,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.9152, 'learning_rate': 4.0000000000000003e-07, 'epoch': 0.04} 4%|███▎ | 21/509 [03:54<1:25:40, 10.53s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:17:55,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:17:57,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:00,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:02,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▍ | 22/509 [04:04<1:24:39, 10.43s/it] 4%|███▍ | 22/509 [04:04<1:24:39, 10.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:18:05,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:07,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:10,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:12,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▌ | 23/509 [04:14<1:23:47, 10.34s/it] 5%|███▌ | 23/509 [04:14<1:23:47, 10.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:18:15,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:20,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:23,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▊ | 24/509 [04:24<1:23:20, 10.31s/it] 5%|███▊ | 24/509 [04:24<1:23:20, 10.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:18:25,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:28,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:30,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:33,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.689, 'learning_rate': 4.800000000000001e-07, 'epoch': 0.05} 5%|███▉ | 25/509 [04:35<1:23:34, 10.36s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:18:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:41,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 26/509 [04:45<1:22:37, 10.26s/it]g-point operations will not be computed-28 23:18:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 26/509 [04:45<1:22:37, 10.26s/it]g-point operations will not be computed-28 23:18:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 26/509 [04:45<1:22:37, 10.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:18:46,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:48,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:46,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:51,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:46,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▏ | 27/509 [04:55<1:21:30, 10.15s/it]g-point operations will not be computed-28 23:18:46,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▏ | 27/509 [04:55<1:21:30, 10.15s/it]g-point operations will not be computed-28 23:18:46,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▏ | 27/509 [04:55<1:21:30, 10.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:18:56,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:18:58,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:56,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:00,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:56,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:00,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:18:56,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 28/509 [05:04<1:20:31, 10.05s/it]g-point operations will not be computed-28 23:18:56,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 28/509 [05:04<1:20:31, 10.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:19:05,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:08,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:05,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:10,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:05,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:12,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:05,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:12,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:05,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▌ | 29/509 [05:14<1:19:36, 9.95s/it]g-point operations will not be computed-28 23:19:05,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▌ | 29/509 [05:14<1:19:36, 9.95s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:19:15,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:17,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:15,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:20,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:15,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:20,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:15,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 30/509 [05:24<1:18:36, 9.85s/it]g-point operations will not be computed-28 23:19:15,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 30/509 [05:24<1:18:36, 9.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:19:25,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:27,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:25,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:29,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:25,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:29,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:25,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▊ | 31/509 [05:33<1:17:40, 9.75s/it]g-point operations will not be computed-28 23:19:25,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▊ | 31/509 [05:33<1:17:40, 9.75s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:19:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:36,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:39,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 32/509 [05:43<1:16:50, 9.67s/it]g-point operations will not be computed-28 23:19:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 32/509 [05:43<1:16:50, 9.67s/it]g-point operations will not be computed-28 23:19:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 32/509 [05:43<1:16:50, 9.67s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:19:44,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:46,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:44,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:48,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:44,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:48,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:44,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████▏ | 33/509 [05:52<1:15:53, 9.57s/it]g-point operations will not be computed-28 23:19:44,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████▏ | 33/509 [05:52<1:15:53, 9.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:19:53,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:55,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:53,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:19:57,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:19:53,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:19:53,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:19:53,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 34/509 [06:01<1:14:29, 9.41s/it]g-point operations will not be computed-28 23:19:53,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:04,524 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:02,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:06,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:02,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:06,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:02,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 35/509 [06:10<1:13:09, 9.26s/it]g-point operations will not be computed-28 23:20:02,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 35/509 [06:10<1:13:09, 9.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:11,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:13,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:11,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:15,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:11,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 36/509 [06:19<1:11:44, 9.10s/it]g-point operations will not be computed-28 23:20:11,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 36/509 [06:19<1:11:44, 9.10s/it]g-point operations will not be computed-28 23:20:11,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 36/509 [06:19<1:11:44, 9.10s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:19,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:21,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:19,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:24,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:19,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:24,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:19,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▊ | 37/509 [06:27<1:10:02, 8.90s/it]g-point operations will not be computed-28 23:20:19,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▊ | 37/509 [06:27<1:10:02, 8.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:28,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:30,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:28,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:32,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:28,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:32,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:28,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▉ | 38/509 [06:35<1:08:21, 8.71s/it]g-point operations will not be computed-28 23:20:28,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▉ | 38/509 [06:35<1:08:21, 8.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:36,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:38,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:36,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:40,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:36,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:40,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:36,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▏ | 39/509 [06:43<1:06:31, 8.49s/it]g-point operations will not be computed-28 23:20:36,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▏ | 39/509 [06:43<1:06:31, 8.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:44,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:46,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:44,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:48,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:44,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:48,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:44,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 40/509 [06:51<1:04:36, 8.26s/it]g-point operations will not be computed-28 23:20:44,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 40/509 [06:51<1:04:36, 8.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:51,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:53,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:51,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:57,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:51,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:20:57,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:51,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▍ | 41/509 [06:59<1:02:17, 7.99s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:20:59,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:00,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:59,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:02,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:59,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:02,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:59,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▊ | 42/509 [07:05<59:38, 7.66s/it]g-point operations will not be computed-28 23:20:59,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:07,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:06,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:07,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:06,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:09,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.6677, 'learning_rate': 8.200000000000001e-07, 'epoch': 0.08} [WARNING|modeling_utils.py:388] 2022-02-28 23:21:10,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▉ | 43/509 [07:12<56:33, 7.28s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:12,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:13,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:15,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:12,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:15,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:12,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████ | 44/509 [07:18<53:06, 6.85s/it]g-point operations will not be computed-28 23:21:12,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████ | 44/509 [07:18<53:06, 6.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:17,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▏ | 45/509 [07:23<49:21, 6.38s/it]g-point operations will not be computed-28 23:21:17,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▏ | 45/509 [07:23<49:21, 6.38s/it]g-point operations will not be computed-28 23:21:17,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▏ | 45/509 [07:23<49:21, 6.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:23,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:25,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:23,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:25,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:23,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▍ | 46/509 [07:28<45:22, 5.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:27,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:29,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:27,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:29,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:27,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▌ | 47/509 [07:32<41:08, 5.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:31,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:33,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:31,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:33,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:31,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▋ | 48/509 [07:35<37:01, 4.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:35,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▋ | 48/509 [07:35<37:01, 4.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:35,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 49/509 [07:38<32:57, 4.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:38,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:39,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:38,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:39,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:38,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 50/509 [07:42<30:01, 3.93s/it]g-point operations will not be computed-28 23:21:38,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 50/509 [07:42<30:01, 3.93s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 50/509 [07:42<30:01, 3.93s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:49,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:21:49,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 51/509 [07:54<49:29, 6.48s/it]g-point operations will not be computed-28 23:21:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 51/509 [07:54<49:29, 6.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:55,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 51/509 [07:54<49:29, 6.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:21:55,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:01,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:21:55,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 52/509 [08:06<1:01:10, 8.03s/it]g-point operations will not be computed-28 23:21:55,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 52/509 [08:06<1:01:10, 8.03s/it]g-point operations will not be computed-28 23:21:55,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 52/509 [08:06<1:01:10, 8.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:07,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▏ | 52/509 [08:06<1:01:10, 8.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:07,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:13,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:07,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▎ | 53/509 [08:17<1:09:29, 9.14s/it]g-point operations will not be computed-28 23:22:07,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▎ | 53/509 [08:17<1:09:29, 9.14s/it]g-point operations will not be computed-28 23:22:07,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▎ | 53/509 [08:17<1:09:29, 9.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:19,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████▎ | 53/509 [08:17<1:09:29, 9.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:19,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:24,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:19,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:24,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:19,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 54/509 [08:29<1:14:50, 9.87s/it]g-point operations will not be computed-28 23:22:19,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 54/509 [08:29<1:14:50, 9.87s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:30,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 54/509 [08:29<1:14:50, 9.87s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:30,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:36,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:30,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 55/509 [08:40<1:18:29, 10.37s/it]g-point operations will not be computed-28 23:22:30,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 55/509 [08:40<1:18:29, 10.37s/it]g-point operations will not be computed-28 23:22:30,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 55/509 [08:40<1:18:29, 10.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:42,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 55/509 [08:40<1:18:29, 10.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:42,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:47,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:42,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▊ | 56/509 [08:52<1:20:44, 10.69s/it]g-point operations will not be computed-28 23:22:42,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▊ | 56/509 [08:52<1:20:44, 10.69s/it]g-point operations will not be computed-28 23:22:42,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▊ | 56/509 [08:52<1:20:44, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:53,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▊ | 56/509 [08:52<1:20:44, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:22:53,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:59,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:53,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:22:59,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:22:53,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 57/509 [09:03<1:22:11, 10.91s/it]g-point operations will not be computed-28 23:22:53,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 57/509 [09:03<1:22:11, 10.91s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 57/509 [09:03<1:22:11, 10.91s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:10,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:10,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|█████████ | 58/509 [09:15<1:22:53, 11.03s/it]g-point operations will not be computed-28 23:23:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|█████████ | 58/509 [09:15<1:22:53, 11.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:16,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|█████████ | 58/509 [09:15<1:22:53, 11.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:16,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:21,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:16,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:21,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:16,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 59/509 [09:26<1:23:17, 11.10s/it]g-point operations will not be computed-28 23:23:16,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 59/509 [09:26<1:23:17, 11.10s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:27,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 59/509 [09:26<1:23:17, 11.10s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:27,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:33,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:27,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:33,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:27,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 60/509 [09:37<1:24:00, 11.22s/it]g-point operations will not be computed-28 23:23:27,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 60/509 [09:37<1:24:00, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:39,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 60/509 [09:37<1:24:00, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:39,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:39,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:39,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▌ | 61/509 [09:49<1:23:44, 11.22s/it]g-point operations will not be computed-28 23:23:39,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▌ | 61/509 [09:49<1:23:44, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:50,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▌ | 61/509 [09:49<1:23:44, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:23:50,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:56,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:50,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:23:56,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:23:50,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 62/509 [10:00<1:24:29, 11.34s/it]g-point operations will not be computed-28 23:23:50,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 62/509 [10:00<1:24:29, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:02,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 62/509 [10:00<1:24:29, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:02,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:07,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:02,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:07,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:02,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▉ | 63/509 [10:12<1:25:01, 11.44s/it]g-point operations will not be computed-28 23:24:02,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▉ | 63/509 [10:12<1:25:01, 11.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:13,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▉ | 63/509 [10:12<1:25:01, 11.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:13,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:19,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:13,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:19,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:13,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 64/509 [10:23<1:24:32, 11.40s/it]g-point operations will not be computed-28 23:24:13,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 64/509 [10:23<1:24:32, 11.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:25,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 64/509 [10:23<1:24:32, 11.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:25,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:30,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:25,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:30,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:25,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 65/509 [10:35<1:24:33, 11.43s/it]g-point operations will not be computed-28 23:24:25,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 65/509 [10:35<1:24:33, 11.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:36,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 65/509 [10:35<1:24:33, 11.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:36,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:43,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:36,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:43,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:36,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 66/509 [10:47<1:26:36, 11.73s/it]g-point operations will not be computed-28 23:24:36,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 66/509 [10:47<1:26:36, 11.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:48,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 66/509 [10:47<1:26:36, 11.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:48,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:24:54,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:48,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▌ | 67/509 [10:58<1:25:15, 11.57s/it]g-point operations will not be computed-28 23:24:48,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▌ | 67/509 [10:58<1:25:15, 11.57s/it]g-point operations will not be computed-28 23:24:48,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▌ | 67/509 [10:58<1:25:15, 11.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:59,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▌ | 67/509 [10:58<1:25:15, 11.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:24:59,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:05,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:59,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▋ | 68/509 [11:09<1:23:39, 11.38s/it]g-point operations will not be computed-28 23:24:59,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▋ | 68/509 [11:09<1:23:39, 11.38s/it]g-point operations will not be computed-28 23:24:59,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▋ | 68/509 [11:09<1:23:39, 11.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:10,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▋ | 68/509 [11:09<1:23:39, 11.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:10,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:16,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:10,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:16,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:10,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▊ | 69/509 [11:20<1:22:51, 11.30s/it]g-point operations will not be computed-28 23:25:10,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▊ | 69/509 [11:20<1:22:51, 11.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:22,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▊ | 69/509 [11:20<1:22:51, 11.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:22,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:27,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:22,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████ | 70/509 [11:31<1:21:46, 11.18s/it]g-point operations will not be computed-28 23:25:22,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████ | 70/509 [11:31<1:21:46, 11.18s/it]g-point operations will not be computed-28 23:25:22,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████ | 70/509 [11:31<1:21:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:32,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████ | 70/509 [11:31<1:21:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:32,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:38,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:32,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 71/509 [11:42<1:20:33, 11.03s/it]g-point operations will not be computed-28 23:25:32,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 71/509 [11:42<1:20:33, 11.03s/it]g-point operations will not be computed-28 23:25:32,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 71/509 [11:42<1:20:33, 11.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:43,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 71/509 [11:42<1:20:33, 11.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:43,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:48,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:43,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 72/509 [11:53<1:19:35, 10.93s/it]g-point operations will not be computed-28 23:25:43,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 72/509 [11:53<1:19:35, 10.93s/it]g-point operations will not be computed-28 23:25:43,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 72/509 [11:53<1:19:35, 10.93s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:54,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 72/509 [11:53<1:19:35, 10.93s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:25:54,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:59,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:54,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:25:59,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:25:54,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 73/509 [12:03<1:18:58, 10.87s/it]g-point operations will not be computed-28 23:25:54,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 73/509 [12:03<1:18:58, 10.87s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:04,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 73/509 [12:03<1:18:58, 10.87s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:04,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:26:10,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:26:04,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 74/509 [12:14<1:18:09, 10.78s/it]g-point operations will not be computed-28 23:26:04,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 74/509 [12:14<1:18:09, 10.78s/it]g-point operations will not be computed-28 23:26:04,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 74/509 [12:14<1:18:09, 10.78s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 74/509 [12:14<1:18:09, 10.78s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 74/509 [12:14<1:18:09, 10.78s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 74/509 [12:14<1:18:09, 10.78s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 75/509 [12:25<1:18:12, 10.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 75/509 [12:25<1:18:12, 10.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 75/509 [12:25<1:18:12, 10.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 75/509 [12:25<1:18:12, 10.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 75/509 [12:25<1:18:12, 10.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 76/509 [12:35<1:17:07, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 76/509 [12:35<1:17:07, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 76/509 [12:35<1:17:07, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 76/509 [12:35<1:17:07, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 76/509 [12:35<1:17:07, 10.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████ | 77/509 [12:46<1:16:04, 10.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████ | 77/509 [12:46<1:16:04, 10.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████ | 77/509 [12:46<1:16:04, 10.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████ | 77/509 [12:46<1:16:04, 10.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████ | 77/509 [12:46<1:16:04, 10.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████▎ | 78/509 [12:56<1:15:17, 10.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████▎ | 78/509 [12:56<1:15:17, 10.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████▎ | 78/509 [12:56<1:15:17, 10.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████▎ | 78/509 [12:56<1:15:17, 10.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|████████████▎ | 78/509 [12:56<1:15:17, 10.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 79/509 [13:06<1:14:55, 10.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 79/509 [13:06<1:14:55, 10.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 79/509 [13:06<1:14:55, 10.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 79/509 [13:06<1:14:55, 10.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 79/509 [13:06<1:14:55, 10.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 80/509 [13:16<1:13:44, 10.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 80/509 [13:16<1:13:44, 10.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 80/509 [13:16<1:13:44, 10.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 80/509 [13:16<1:13:44, 10.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 80/509 [13:16<1:13:44, 10.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 81/509 [13:26<1:12:44, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 81/509 [13:26<1:12:44, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 81/509 [13:26<1:12:44, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 81/509 [13:26<1:12:44, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 81/509 [13:26<1:12:44, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▉ | 82/509 [13:36<1:11:45, 10.08s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▉ | 82/509 [13:36<1:11:45, 10.08s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▉ | 82/509 [13:36<1:11:45, 10.08s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▉ | 82/509 [13:36<1:11:45, 10.08s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|█████████████ | 83/509 [13:46<1:10:59, 10.00s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|█████████████ | 83/509 [13:46<1:10:59, 10.00s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.5573, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.16} 16%|█████████████ | 83/509 [13:46<1:10:59, 10.00s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|█████████████ | 83/509 [13:46<1:10:59, 10.00s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▏ | 84/509 [13:55<1:09:29, 9.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▏ | 84/509 [13:55<1:09:29, 9.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.5433, 'learning_rate': 1.6200000000000002e-06, 'epoch': 0.16} 17%|█████████████▏ | 84/509 [13:55<1:09:29, 9.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▏ | 84/509 [13:55<1:09:29, 9.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▎ | 85/509 [14:04<1:08:00, 9.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▎ | 85/509 [14:04<1:08:00, 9.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4357, 'learning_rate': 1.6400000000000002e-06, 'epoch': 0.17} 17%|█████████████▎ | 85/509 [14:04<1:08:00, 9.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▎ | 85/509 [14:04<1:08:00, 9.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▌ | 86/509 [14:13<1:06:37, 9.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▌ | 86/509 [14:13<1:06:37, 9.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4077, 'learning_rate': 1.6600000000000002e-06, 'epoch': 0.17} 17%|█████████████▌ | 86/509 [14:13<1:06:37, 9.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▌ | 86/509 [14:13<1:06:37, 9.45s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▋ | 87/509 [14:22<1:04:57, 9.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▋ | 87/509 [14:22<1:04:57, 9.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.6741, 'learning_rate': 1.6800000000000002e-06, 'epoch': 0.17} 17%|█████████████▋ | 87/509 [14:22<1:04:57, 9.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▋ | 87/509 [14:22<1:04:57, 9.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▊ | 88/509 [14:31<1:03:33, 9.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▊ | 88/509 [14:31<1:03:33, 9.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4995, 'learning_rate': 1.7000000000000002e-06, 'epoch': 0.17} 17%|█████████████▊ | 88/509 [14:31<1:03:33, 9.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▊ | 88/509 [14:31<1:03:33, 9.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▊ | 88/509 [14:31<1:03:33, 9.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▉ | 89/509 [14:39<1:02:16, 8.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▉ | 89/509 [14:39<1:02:16, 8.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▉ | 89/509 [14:39<1:02:16, 8.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▉ | 89/509 [14:39<1:02:16, 8.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▉ | 89/509 [14:39<1:02:16, 8.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 90/509 [14:47<1:00:24, 8.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 90/509 [14:47<1:00:24, 8.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 90/509 [14:47<1:00:24, 8.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 90/509 [14:47<1:00:24, 8.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 90/509 [14:47<1:00:24, 8.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▋ | 91/509 [14:55<58:03, 8.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▋ | 91/509 [14:55<58:03, 8.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▋ | 91/509 [14:55<58:03, 8.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▋ | 91/509 [14:55<58:03, 8.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▋ | 91/509 [14:55<58:03, 8.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:26:15,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▊ | 92/509 [15:02<55:22, 7.97s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▊ | 92/509 [15:02<55:22, 7.97s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▊ | 92/509 [15:02<55:22, 7.97s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▊ | 92/509 [15:02<55:22, 7.97s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▉ | 93/509 [15:09<52:33, 7.58s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:10,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:10,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:10,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:02,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|███████████████▏ | 94/509 [15:15<49:03, 7.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|███████████████▏ | 94/509 [15:15<49:03, 7.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|███████████████▏ | 94/509 [15:15<49:03, 7.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:18,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:18,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:22,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:22,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▍ | 96/509 [15:25<41:53, 6.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:25,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:27,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:25,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:27,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:25,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▋ | 97/509 [15:29<38:18, 5.58s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:29,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:31,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:29,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:31,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:29,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▊ | 98/509 [15:33<34:44, 5.07s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:33,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▉ | 99/509 [15:37<31:04, 4.55s/it]g-point operations will not be computed-28 23:29:33,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▉ | 99/509 [15:37<31:04, 4.55s/it]g-point operations will not be computed-28 23:29:33,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:37,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:36,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:29:37,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:36,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 100/509 [15:40<28:30, 4.18s/it]g-point operations will not be computed-28 23:29:36,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 100/509 [15:40<28:30, 4.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 100/509 [15:40<28:30, 4.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 100/509 [15:40<28:30, 4.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████ | 101/509 [15:53<45:58, 6.76s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████ | 101/509 [15:53<45:58, 6.76s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3864, 'learning_rate': 1.9600000000000003e-06, 'epoch': 0.2} 20%|████████████████ | 101/509 [15:53<45:58, 6.76s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████ | 101/509 [15:53<45:58, 6.76s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████▏ | 102/509 [16:05<56:58, 8.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████▏ | 102/509 [16:05<56:58, 8.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3858, 'learning_rate': 1.98e-06, 'epoch': 0.2} 20%|████████████████▏ | 102/509 [16:05<56:58, 8.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████▏ | 102/509 [16:05<56:58, 8.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 103/509 [16:17<1:04:09, 9.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 103/509 [16:17<1:04:09, 9.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2273, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.2} 20%|███████████████▉ | 103/509 [16:17<1:04:09, 9.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 103/509 [16:17<1:04:09, 9.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████▏ | 104/509 [16:29<1:09:01, 10.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████▏ | 104/509 [16:29<1:09:01, 10.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2232, 'learning_rate': 2.02e-06, 'epoch': 0.2} 20%|████████████████▏ | 104/509 [16:29<1:09:01, 10.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|████████████████▏ | 104/509 [16:29<1:09:01, 10.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▎ | 105/509 [16:41<1:12:08, 10.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▎ | 105/509 [16:41<1:12:08, 10.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1973, 'learning_rate': 2.04e-06, 'epoch': 0.21} 21%|████████████████▎ | 105/509 [16:41<1:12:08, 10.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▎ | 105/509 [16:41<1:12:08, 10.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▍ | 106/509 [16:53<1:14:08, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▍ | 106/509 [16:53<1:14:08, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2116, 'learning_rate': 2.06e-06, 'epoch': 0.21} 21%|████████████████▍ | 106/509 [16:53<1:14:08, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▍ | 106/509 [16:53<1:14:08, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 107/509 [17:04<1:15:14, 11.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 107/509 [17:04<1:15:14, 11.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3086, 'learning_rate': 2.08e-06, 'epoch': 0.21} 21%|████████████████▌ | 107/509 [17:04<1:15:14, 11.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 107/509 [17:04<1:15:14, 11.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 107/509 [17:04<1:15:14, 11.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 108/509 [17:16<1:15:40, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 108/509 [17:16<1:15:40, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 108/509 [17:16<1:15:40, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 108/509 [17:16<1:15:40, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 108/509 [17:16<1:15:40, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▉ | 109/509 [17:27<1:15:34, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▉ | 109/509 [17:27<1:15:34, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▉ | 109/509 [17:27<1:15:34, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▉ | 109/509 [17:27<1:15:34, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▉ | 109/509 [17:27<1:15:34, 11.34s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 110/509 [17:38<1:15:15, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 110/509 [17:38<1:15:15, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 110/509 [17:38<1:15:15, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 110/509 [17:38<1:15:15, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 110/509 [17:38<1:15:15, 11.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 111/509 [17:50<1:14:47, 11.28s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 111/509 [17:50<1:14:47, 11.28s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 111/509 [17:50<1:14:47, 11.28s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 111/509 [17:50<1:14:47, 11.28s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 112/509 [18:01<1:14:12, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 112/509 [18:01<1:14:12, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2667, 'learning_rate': 2.1800000000000003e-06, 'epoch': 0.22} 22%|█████████████████▍ | 112/509 [18:01<1:14:12, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 112/509 [18:01<1:14:12, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 112/509 [18:01<1:14:12, 11.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 113/509 [18:12<1:13:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 113/509 [18:12<1:13:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 113/509 [18:12<1:13:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 113/509 [18:12<1:13:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 113/509 [18:12<1:13:46, 11.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.342, 'learning_rate': 2.24e-06, 'epoch': 0.23} 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▋ | 114/509 [18:23<1:13:12, 11.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████ | 116/509 [18:45<1:12:18, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████ | 116/509 [18:45<1:12:18, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████ | 116/509 [18:45<1:12:18, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████ | 116/509 [18:45<1:12:18, 11.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▏ | 117/509 [18:56<1:11:58, 11.02s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▏ | 117/509 [18:56<1:11:58, 11.02s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4111, 'learning_rate': 2.28e-06, 'epoch': 0.23} 23%|██████████████████▏ | 117/509 [18:56<1:11:58, 11.02s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▏ | 117/509 [18:56<1:11:58, 11.02s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▎ | 118/509 [19:06<1:11:19, 10.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▎ | 118/509 [19:06<1:11:19, 10.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3774, 'learning_rate': 2.3000000000000004e-06, 'epoch': 0.23} 23%|██████████████████▎ | 118/509 [19:06<1:11:19, 10.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▎ | 118/509 [19:06<1:11:19, 10.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.29, 'learning_rate': 2.3200000000000002e-06, 'epoch': 0.23} 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3062, 'learning_rate': 2.3400000000000005e-06, 'epoch': 0.24} 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▍ | 119/509 [19:17<1:10:33, 10.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▊ | 121/509 [19:38<1:08:48, 10.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▊ | 121/509 [19:38<1:08:48, 10.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▊ | 121/509 [19:38<1:08:48, 10.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▊ | 121/509 [19:38<1:08:48, 10.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 122/509 [19:48<1:08:03, 10.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 122/509 [19:48<1:08:03, 10.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.324, 'learning_rate': 2.38e-06, 'epoch': 0.24} 24%|██████████████████▉ | 122/509 [19:48<1:08:03, 10.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 122/509 [19:48<1:08:03, 10.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 122/509 [19:48<1:08:03, 10.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████ | 123/509 [19:58<1:07:11, 10.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████ | 123/509 [19:58<1:07:11, 10.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████ | 123/509 [19:58<1:07:11, 10.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████ | 123/509 [19:58<1:07:11, 10.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████ | 123/509 [19:58<1:07:11, 10.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 124/509 [20:08<1:06:15, 10.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 124/509 [20:08<1:06:15, 10.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 124/509 [20:08<1:06:15, 10.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 124/509 [20:08<1:06:15, 10.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 124/509 [20:08<1:06:15, 10.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 125/509 [20:19<1:06:14, 10.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 125/509 [20:19<1:06:14, 10.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 125/509 [20:19<1:06:14, 10.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 125/509 [20:19<1:06:14, 10.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 125/509 [20:19<1:06:14, 10.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 126/509 [20:29<1:05:05, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 126/509 [20:29<1:05:05, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 126/509 [20:29<1:05:05, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 126/509 [20:29<1:05:05, 10.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 127/509 [20:39<1:04:15, 10.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 127/509 [20:39<1:04:15, 10.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3378, 'learning_rate': 2.4800000000000004e-06, 'epoch': 0.25} 25%|███████████████████▋ | 127/509 [20:39<1:04:15, 10.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 127/509 [20:39<1:04:15, 10.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 128/509 [20:48<1:03:35, 10.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 128/509 [20:48<1:03:35, 10.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2826, 'learning_rate': 2.5e-06, 'epoch': 0.25} 25%|███████████████████▊ | 128/509 [20:48<1:03:35, 10.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 128/509 [20:48<1:03:35, 10.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2251, 'learning_rate': 2.52e-06, 'epoch': 0.25} 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4072, 'learning_rate': 2.5400000000000002e-06, 'epoch': 0.26} 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|████████████████████ | 129/509 [20:58<1:02:35, 9.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 131/509 [21:17<1:01:00, 9.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 131/509 [21:17<1:01:00, 9.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2784, 'learning_rate': 2.56e-06, 'epoch': 0.26} 26%|████████████████████▎ | 131/509 [21:17<1:01:00, 9.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 131/509 [21:17<1:01:00, 9.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 131/509 [21:17<1:01:00, 9.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 132/509 [21:26<1:00:17, 9.59s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 132/509 [21:26<1:00:17, 9.59s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 132/509 [21:26<1:00:17, 9.59s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 132/509 [21:26<1:00:17, 9.59s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 132/509 [21:26<1:00:17, 9.59s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.327, 'learning_rate': 2.6200000000000003e-06, 'epoch': 0.26} 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|█████████████████████▏ | 133/509 [21:36<59:31, 9.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:35:52,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:35:52,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.35, 'learning_rate': 2.64e-06, 'epoch': 0.26} [WARNING|modeling_utils.py:388] 2022-02-28 23:35:52,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:35:52,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:35:52,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:35:52,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▋ | 136/509 [22:02<56:22, 9.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▋ | 136/509 [22:02<56:22, 9.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▋ | 136/509 [22:02<56:22, 9.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▋ | 136/509 [22:02<56:22, 9.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▊ | 137/509 [22:11<55:04, 8.88s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▊ | 137/509 [22:11<55:04, 8.88s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4093, 'learning_rate': 2.68e-06, 'epoch': 0.27} 27%|█████████████████████▊ | 137/509 [22:11<55:04, 8.88s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▊ | 137/509 [22:11<55:04, 8.88s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▉ | 138/509 [22:19<53:30, 8.65s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▉ | 138/509 [22:19<53:30, 8.65s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3439, 'learning_rate': 2.7000000000000004e-06, 'epoch': 0.27} 27%|█████████████████████▉ | 138/509 [22:19<53:30, 8.65s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▉ | 138/509 [22:19<53:30, 8.65s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▉ | 138/509 [22:19<53:30, 8.65s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|██████████████████████ | 139/509 [22:27<52:00, 8.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|██████████████████████ | 139/509 [22:27<52:00, 8.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|██████████████████████ | 139/509 [22:27<52:00, 8.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|██████████████████████ | 139/509 [22:27<52:00, 8.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|██████████████████████ | 139/509 [22:27<52:00, 8.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 140/509 [22:34<50:12, 8.16s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 140/509 [22:34<50:12, 8.16s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 140/509 [22:34<50:12, 8.16s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:40,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:40,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.451, 'learning_rate': 2.7600000000000003e-06, 'epoch': 0.28} [WARNING|modeling_utils.py:388] 2022-02-28 23:36:40,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:40,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:40,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▌ | 142/509 [22:48<45:57, 7.51s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:50,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:50,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:50,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▊ | 143/509 [22:54<43:20, 7.10s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:56,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:58,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:36:58,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3836, 'learning_rate': 2.82e-06, 'epoch': 0.28} [WARNING|modeling_utils.py:388] 2022-02-28 23:37:02,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:02,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|███████████████████████ | 145/509 [23:05<37:40, 6.21s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:06,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:08,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:08,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:10,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:12,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:12,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:14,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:14,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:15,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:18,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:18,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:20,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:20,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:22,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:22,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:28,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:28,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:37:28,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 151/509 [23:35<37:50, 6.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 151/509 [23:35<37:50, 6.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 151/509 [23:35<37:50, 6.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 151/509 [23:35<37:50, 6.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 151/509 [23:35<37:50, 6.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▏ | 152/509 [23:47<47:29, 7.98s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▏ | 152/509 [23:47<47:29, 7.98s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▏ | 152/509 [23:47<47:29, 7.98s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▏ | 152/509 [23:47<47:29, 7.98s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▎ | 153/509 [23:59<53:39, 9.04s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▎ | 153/509 [23:59<53:39, 9.04s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2062, 'learning_rate': 3e-06, 'epoch': 0.3} 30%|████████████████████████▎ | 153/509 [23:59<53:39, 9.04s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▎ | 153/509 [23:59<53:39, 9.04s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▎ | 153/509 [23:59<53:39, 9.04s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▌ | 154/509 [24:10<57:34, 9.73s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▌ | 154/509 [24:10<57:34, 9.73s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▌ | 154/509 [24:10<57:34, 9.73s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████▌ | 154/509 [24:10<57:34, 9.73s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 155/509 [24:21<1:00:28, 10.25s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 155/509 [24:21<1:00:28, 10.25s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2038, 'learning_rate': 3.04e-06, 'epoch': 0.3} 30%|████████████████████████ | 155/509 [24:21<1:00:28, 10.25s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 155/509 [24:21<1:00:28, 10.25s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|████████████████████████ | 155/509 [24:21<1:00:28, 10.25s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▏ | 156/509 [24:33<1:02:10, 10.57s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▏ | 156/509 [24:33<1:02:10, 10.57s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▏ | 156/509 [24:33<1:02:10, 10.57s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▏ | 156/509 [24:33<1:02:10, 10.57s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▎ | 157/509 [24:44<1:03:11, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▎ | 157/509 [24:44<1:03:11, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3046, 'learning_rate': 3.08e-06, 'epoch': 0.31} 31%|████████████████████████▎ | 157/509 [24:44<1:03:11, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▎ | 157/509 [24:44<1:03:11, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▌ | 158/509 [24:55<1:03:44, 10.90s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▌ | 158/509 [24:55<1:03:44, 10.90s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3459, 'learning_rate': 3.1000000000000004e-06, 'epoch': 0.31} 31%|████████████████████████▌ | 158/509 [24:55<1:03:44, 10.90s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▌ | 158/509 [24:55<1:03:44, 10.90s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▌ | 158/509 [24:55<1:03:44, 10.90s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▋ | 159/509 [25:06<1:03:58, 10.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▋ | 159/509 [25:06<1:03:58, 10.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▋ | 159/509 [25:06<1:03:58, 10.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▋ | 159/509 [25:06<1:03:58, 10.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▊ | 160/509 [25:18<1:04:05, 11.02s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▊ | 160/509 [25:18<1:04:05, 11.02s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.389, 'learning_rate': 3.1400000000000004e-06, 'epoch': 0.31} 31%|████████████████████████▊ | 160/509 [25:18<1:04:05, 11.02s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▊ | 160/509 [25:18<1:04:05, 11.02s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▊ | 160/509 [25:18<1:04:05, 11.02s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▉ | 161/509 [25:29<1:04:04, 11.05s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▉ | 161/509 [25:29<1:04:04, 11.05s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▉ | 161/509 [25:29<1:04:04, 11.05s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▉ | 161/509 [25:29<1:04:04, 11.05s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▉ | 161/509 [25:29<1:04:04, 11.05s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▏ | 162/509 [25:40<1:03:37, 11.00s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▏ | 162/509 [25:40<1:03:37, 11.00s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▏ | 162/509 [25:40<1:03:37, 11.00s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▏ | 162/509 [25:40<1:03:37, 11.00s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▎ | 163/509 [25:50<1:03:03, 10.93s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▎ | 163/509 [25:50<1:03:03, 10.93s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.348, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.32} 32%|█████████████████████████▎ | 163/509 [25:50<1:03:03, 10.93s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▎ | 163/509 [25:50<1:03:03, 10.93s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2921, 'learning_rate': 3.2200000000000005e-06, 'epoch': 0.32} 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2242, 'learning_rate': 3.2400000000000003e-06, 'epoch': 0.32} 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▍ | 164/509 [26:01<1:02:36, 10.89s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▊ | 166/509 [26:23<1:01:46, 10.81s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▊ | 166/509 [26:23<1:01:46, 10.81s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3369, 'learning_rate': 3.2600000000000006e-06, 'epoch': 0.33} 33%|█████████████████████████▊ | 166/509 [26:23<1:01:46, 10.81s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▊ | 166/509 [26:23<1:01:46, 10.81s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▊ | 166/509 [26:23<1:01:46, 10.81s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▉ | 167/509 [26:33<1:01:24, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▉ | 167/509 [26:33<1:01:24, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▉ | 167/509 [26:33<1:01:24, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▉ | 167/509 [26:33<1:01:24, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|█████████████████████████▉ | 167/509 [26:33<1:01:24, 10.77s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████ | 168/509 [26:44<1:00:56, 10.72s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████ | 168/509 [26:44<1:00:56, 10.72s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████ | 168/509 [26:44<1:00:56, 10.72s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████ | 168/509 [26:44<1:00:56, 10.72s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▏ | 169/509 [26:54<1:00:30, 10.68s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▏ | 169/509 [26:54<1:00:30, 10.68s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3286, 'learning_rate': 3.3200000000000004e-06, 'epoch': 0.33} 33%|██████████████████████████▏ | 169/509 [26:54<1:00:30, 10.68s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▏ | 169/509 [26:54<1:00:30, 10.68s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▏ | 169/509 [26:54<1:00:30, 10.68s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|███████████████████████████ | 170/509 [27:05<59:51, 10.59s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|███████████████████████████ | 170/509 [27:05<59:51, 10.59s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|███████████████████████████ | 170/509 [27:05<59:51, 10.59s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|███████████████████████████ | 170/509 [27:05<59:51, 10.59s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▏ | 171/509 [27:15<59:15, 10.52s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▏ | 171/509 [27:15<59:15, 10.52s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2808, 'learning_rate': 3.3600000000000004e-06, 'epoch': 0.34} 34%|███████████████████████████▏ | 171/509 [27:15<59:15, 10.52s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▏ | 171/509 [27:15<59:15, 10.52s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▎ | 172/509 [27:25<58:39, 10.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▎ | 172/509 [27:25<58:39, 10.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1993, 'learning_rate': 3.3800000000000007e-06, 'epoch': 0.34} 34%|███████████████████████████▎ | 172/509 [27:25<58:39, 10.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▎ | 172/509 [27:25<58:39, 10.44s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.155, 'learning_rate': 3.4000000000000005e-06, 'epoch': 0.34} g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▋ | 174/509 [27:46<57:28, 10.30s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▋ | 174/509 [27:46<57:28, 10.30s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3105, 'learning_rate': 3.4200000000000007e-06, 'epoch': 0.34} 34%|███████████████████████████▋ | 174/509 [27:46<57:28, 10.30s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▋ | 174/509 [27:46<57:28, 10.30s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▊ | 175/509 [27:56<57:33, 10.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▊ | 175/509 [27:56<57:33, 10.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3962, 'learning_rate': 3.44e-06, 'epoch': 0.34} 34%|███████████████████████████▊ | 175/509 [27:56<57:33, 10.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▊ | 175/509 [27:56<57:33, 10.34s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████ | 176/509 [28:06<56:44, 10.22s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████ | 176/509 [28:06<56:44, 10.22s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2565, 'learning_rate': 3.46e-06, 'epoch': 0.35} 35%|████████████████████████████ | 176/509 [28:06<56:44, 10.22s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████ | 176/509 [28:06<56:44, 10.22s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▏ | 177/509 [28:16<55:44, 10.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▏ | 177/509 [28:16<55:44, 10.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2718, 'learning_rate': 3.48e-06, 'epoch': 0.35} 35%|████████████████████████████▏ | 177/509 [28:16<55:44, 10.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▏ | 177/509 [28:16<55:44, 10.07s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▎ | 178/509 [28:26<55:00, 9.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▎ | 178/509 [28:26<55:00, 9.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3334, 'learning_rate': 3.5e-06, 'epoch': 0.35} 35%|████████████████████████████▎ | 178/509 [28:26<55:00, 9.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▎ | 178/509 [28:26<55:00, 9.97s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▍ | 179/509 [28:35<54:13, 9.86s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▍ | 179/509 [28:35<54:13, 9.86s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2852, 'learning_rate': 3.52e-06, 'epoch': 0.35} 35%|████████████████████████████▍ | 179/509 [28:35<54:13, 9.86s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▍ | 179/509 [28:35<54:13, 9.86s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▋ | 180/509 [28:45<53:29, 9.75s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▋ | 180/509 [28:45<53:29, 9.75s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3456, 'learning_rate': 3.54e-06, 'epoch': 0.35} 35%|████████████████████████████▋ | 180/509 [28:45<53:29, 9.75s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|████████████████████████████▋ | 180/509 [28:45<53:29, 9.75s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▊ | 181/509 [28:54<52:51, 9.67s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▊ | 181/509 [28:54<52:51, 9.67s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2246, 'learning_rate': 3.5600000000000002e-06, 'epoch': 0.36} 36%|████████████████████████████▊ | 181/509 [28:54<52:51, 9.67s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▊ | 181/509 [28:54<52:51, 9.67s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▉ | 182/509 [29:03<52:05, 9.56s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▉ | 182/509 [29:03<52:05, 9.56s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3593, 'learning_rate': 3.58e-06, 'epoch': 0.36} 36%|████████████████████████████▉ | 182/509 [29:03<52:05, 9.56s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▉ | 182/509 [29:03<52:05, 9.56s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████ | 183/509 [29:13<51:10, 9.42s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████ | 183/509 [29:13<51:10, 9.42s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2121, 'learning_rate': 3.6000000000000003e-06, 'epoch': 0.36} 36%|█████████████████████████████ | 183/509 [29:13<51:10, 9.42s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████ | 183/509 [29:13<51:10, 9.42s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▎ | 184/509 [29:22<50:16, 9.28s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▎ | 184/509 [29:22<50:16, 9.28s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2954, 'learning_rate': 3.62e-06, 'epoch': 0.36} 36%|█████████████████████████████▎ | 184/509 [29:22<50:16, 9.28s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▎ | 184/509 [29:22<50:16, 9.28s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▎ | 184/509 [29:22<50:16, 9.28s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2171, 'learning_rate': 3.66e-06, 'epoch': 0.36} 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▍ | 185/509 [29:30<49:21, 9.14s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▊ | 187/509 [29:47<47:16, 8.81s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:43:50,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:43:50,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▉ | 188/509 [29:56<46:11, 8.64s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▉ | 188/509 [29:56<46:11, 8.64s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2459, 'learning_rate': 3.7e-06, 'epoch': 0.37} 37%|█████████████████████████████▉ | 188/509 [29:56<46:11, 8.64s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▉ | 188/509 [29:56<46:11, 8.64s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|██████████████████████████████ | 189/509 [30:03<44:51, 8.41s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|██████████████████████████████ | 189/509 [30:03<44:51, 8.41s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4374, 'learning_rate': 3.7200000000000004e-06, 'epoch': 0.37} 37%|██████████████████████████████ | 189/509 [30:03<44:51, 8.41s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|██████████████████████████████ | 189/509 [30:03<44:51, 8.41s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|██████████████████████████████▏ | 190/509 [30:11<43:10, 8.12s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|██████████████████████████████▏ | 190/509 [30:11<43:10, 8.12s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3142, 'learning_rate': 3.74e-06, 'epoch': 0.37} 37%|██████████████████████████████▏ | 190/509 [30:11<43:10, 8.12s/it]g-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3617, 'learning_rate': 3.7600000000000004e-06, 'epoch': 0.37} [WARNING|modeling_utils.py:388] 2022-02-28 23:44:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:29:42,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▌ | 192/509 [30:25<39:19, 7.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:25,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▌ | 192/509 [30:25<39:19, 7.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:25,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▌ | 192/509 [30:25<39:19, 7.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:25,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▌ | 192/509 [30:25<39:19, 7.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:25,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▋ | 193/509 [30:31<37:07, 7.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:31,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▋ | 193/509 [30:31<37:07, 7.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:31,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:35,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:31,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:35,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:31,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:37,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:31,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:37,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:31,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|███████████████████████████████ | 195/509 [30:41<32:11, 6.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|███████████████████████████████ | 195/509 [30:41<32:11, 6.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:43,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▏ | 196/509 [30:46<29:36, 5.68s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▏ | 196/509 [30:46<29:36, 5.68s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:46,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:48,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:48,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:50,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:52,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:52,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:55,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:55,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:56,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:58,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:44:58,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.6378, 'learning_rate': 3.94e-06, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-02-28 23:45:04,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:45:04,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:45:04,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▉ | 201/509 [31:12<32:47, 6.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▉ | 201/509 [31:12<32:47, 6.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▉ | 201/509 [31:12<32:47, 6.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▉ | 201/509 [31:12<32:47, 6.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▏ | 202/509 [31:23<40:39, 7.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▏ | 202/509 [31:23<40:39, 7.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2226, 'learning_rate': 3.980000000000001e-06, 'epoch': 0.4} 40%|████████████████████████████████▏ | 202/509 [31:23<40:39, 7.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▏ | 202/509 [31:23<40:39, 7.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▏ | 202/509 [31:23<40:39, 7.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 203/509 [31:35<46:04, 9.03s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 203/509 [31:35<46:04, 9.03s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 203/509 [31:35<46:04, 9.03s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 203/509 [31:35<46:04, 9.03s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 203/509 [31:35<46:04, 9.03s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▍ | 204/509 [31:46<49:28, 9.73s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▍ | 204/509 [31:46<49:28, 9.73s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▍ | 204/509 [31:46<49:28, 9.73s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▍ | 204/509 [31:46<49:28, 9.73s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▌ | 205/509 [31:58<51:52, 10.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▌ | 205/509 [31:58<51:52, 10.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3026, 'learning_rate': 4.04e-06, 'epoch': 0.4} 40%|████████████████████████████████▌ | 205/509 [31:58<51:52, 10.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▌ | 205/509 [31:58<51:52, 10.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▌ | 205/509 [31:58<51:52, 10.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▊ | 206/509 [32:09<53:24, 10.58s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▊ | 206/509 [32:09<53:24, 10.58s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▊ | 206/509 [32:09<53:24, 10.58s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▊ | 206/509 [32:09<53:24, 10.58s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▉ | 207/509 [32:21<54:20, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▉ | 207/509 [32:21<54:20, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1553, 'learning_rate': 4.08e-06, 'epoch': 0.41} 41%|████████████████████████████████▉ | 207/509 [32:21<54:20, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▉ | 207/509 [32:21<54:20, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▉ | 207/509 [32:21<54:20, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████ | 208/509 [32:32<54:49, 10.93s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████ | 208/509 [32:32<54:49, 10.93s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████ | 208/509 [32:32<54:49, 10.93s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████ | 208/509 [32:32<54:49, 10.93s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▎ | 209/509 [32:43<54:52, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▎ | 209/509 [32:43<54:52, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2173, 'learning_rate': 4.12e-06, 'epoch': 0.41} 41%|█████████████████████████████████▎ | 209/509 [32:43<54:52, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▎ | 209/509 [32:43<54:52, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▍ | 210/509 [32:54<54:47, 10.99s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▍ | 210/509 [32:54<54:47, 10.99s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.345, 'learning_rate': 4.14e-06, 'epoch': 0.41} 41%|█████████████████████████████████▍ | 210/509 [32:54<54:47, 10.99s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▍ | 210/509 [32:54<54:47, 10.99s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▍ | 210/509 [32:54<54:47, 10.99s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▌ | 211/509 [33:05<54:31, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▌ | 211/509 [33:05<54:31, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▌ | 211/509 [33:05<54:31, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████▌ | 211/509 [33:05<54:31, 10.98s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▋ | 212/509 [33:16<54:12, 10.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▋ | 212/509 [33:16<54:12, 10.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2244, 'learning_rate': 4.18e-06, 'epoch': 0.42} 42%|█████████████████████████████████▋ | 212/509 [33:16<54:12, 10.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▋ | 212/509 [33:16<54:12, 10.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.118, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.42} 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2683, 'learning_rate': 4.22e-06, 'epoch': 0.42} 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 213/509 [33:27<53:46, 10.90s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▏ | 215/509 [33:48<52:54, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▏ | 215/509 [33:48<52:54, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2008, 'learning_rate': 4.24e-06, 'epoch': 0.42} 42%|██████████████████████████████████▏ | 215/509 [33:48<52:54, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▏ | 215/509 [33:48<52:54, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▏ | 215/509 [33:48<52:54, 10.80s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▎ | 216/509 [33:58<52:18, 10.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▎ | 216/509 [33:58<52:18, 10.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▎ | 216/509 [33:58<52:18, 10.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▎ | 216/509 [33:58<52:18, 10.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|██████████████████████████████████▎ | 216/509 [33:58<52:18, 10.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 217/509 [34:09<51:54, 10.67s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 217/509 [34:09<51:54, 10.67s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 217/509 [34:09<51:54, 10.67s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 217/509 [34:09<51:54, 10.67s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 217/509 [34:09<51:54, 10.67s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 218/509 [34:19<51:28, 10.61s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 218/509 [34:19<51:28, 10.61s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 218/509 [34:19<51:28, 10.61s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 218/509 [34:19<51:28, 10.61s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 218/509 [34:19<51:28, 10.61s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 219/509 [34:30<50:56, 10.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 219/509 [34:30<50:56, 10.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 219/509 [34:30<50:56, 10.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 219/509 [34:30<50:56, 10.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 219/509 [34:30<50:56, 10.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████ | 220/509 [34:40<50:24, 10.47s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████ | 220/509 [34:40<50:24, 10.47s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████ | 220/509 [34:40<50:24, 10.47s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████ | 220/509 [34:40<50:24, 10.47s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████ | 220/509 [34:40<50:24, 10.47s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████▏ | 221/509 [34:50<50:01, 10.42s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████▏ | 221/509 [34:50<50:01, 10.42s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████▏ | 221/509 [34:50<50:01, 10.42s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████▏ | 221/509 [34:50<50:01, 10.42s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|███████████████████████████████████▏ | 221/509 [34:50<50:01, 10.42s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▎ | 222/509 [35:01<49:25, 10.33s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▎ | 222/509 [35:01<49:25, 10.33s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▎ | 222/509 [35:01<49:25, 10.33s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▎ | 222/509 [35:01<49:25, 10.33s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▎ | 222/509 [35:01<49:25, 10.33s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 223/509 [35:11<49:04, 10.29s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 223/509 [35:11<49:04, 10.29s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 223/509 [35:11<49:04, 10.29s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 223/509 [35:11<49:04, 10.29s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 223/509 [35:11<49:04, 10.29s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▋ | 224/509 [35:21<48:34, 10.22s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▋ | 224/509 [35:21<48:34, 10.22s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▋ | 224/509 [35:21<48:34, 10.22s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▋ | 224/509 [35:21<48:34, 10.22s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▋ | 224/509 [35:21<48:34, 10.22s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▊ | 225/509 [35:31<48:39, 10.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▊ | 225/509 [35:31<48:39, 10.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▊ | 225/509 [35:31<48:39, 10.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▊ | 225/509 [35:31<48:39, 10.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▊ | 225/509 [35:31<48:39, 10.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2809, 'learning_rate': 4.48e-06, 'epoch': 0.45} 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▉ | 226/509 [35:41<48:01, 10.18s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 228/509 [36:01<46:42, 9.97s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 228/509 [36:01<46:42, 9.97s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0934, 'learning_rate': 4.5e-06, 'epoch': 0.45} 45%|████████████████████████████████████▎ | 228/509 [36:01<46:42, 9.97s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 228/509 [36:01<46:42, 9.97s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▍ | 229/509 [36:10<45:57, 9.85s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▍ | 229/509 [36:10<45:57, 9.85s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3046, 'learning_rate': 4.520000000000001e-06, 'epoch': 0.45} 45%|████████████████████████████████████▍ | 229/509 [36:10<45:57, 9.85s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▍ | 229/509 [36:10<45:57, 9.85s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▌ | 230/509 [36:20<45:29, 9.78s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▌ | 230/509 [36:20<45:29, 9.78s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3771, 'learning_rate': 4.540000000000001e-06, 'epoch': 0.45} 45%|████████████████████████████████████▌ | 230/509 [36:20<45:29, 9.78s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▌ | 230/509 [36:20<45:29, 9.78s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▊ | 231/509 [36:30<44:59, 9.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▊ | 231/509 [36:30<44:59, 9.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.399, 'learning_rate': 4.56e-06, 'epoch': 0.45} 45%|████████████████████████████████████▊ | 231/509 [36:30<44:59, 9.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▊ | 231/509 [36:30<44:59, 9.71s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 232/509 [36:39<44:19, 9.60s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 232/509 [36:39<44:19, 9.60s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.115, 'learning_rate': 4.58e-06, 'epoch': 0.46} 46%|████████████████████████████████████▉ | 232/509 [36:39<44:19, 9.60s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 232/509 [36:39<44:19, 9.60s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████ | 233/509 [36:48<43:45, 9.51s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████ | 233/509 [36:48<43:45, 9.51s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2761, 'learning_rate': 4.600000000000001e-06, 'epoch': 0.46} 46%|█████████████████████████████████████ | 233/509 [36:48<43:45, 9.51s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████ | 233/509 [36:48<43:45, 9.51s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▏ | 234/509 [36:57<43:03, 9.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▏ | 234/509 [36:57<43:03, 9.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1011, 'learning_rate': 4.620000000000001e-06, 'epoch': 0.46} 46%|█████████████████████████████████████▏ | 234/509 [36:57<43:03, 9.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▏ | 234/509 [36:57<43:03, 9.39s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▍ | 235/509 [37:06<42:12, 9.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▍ | 235/509 [37:06<42:12, 9.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3453, 'learning_rate': 4.6400000000000005e-06, 'epoch': 0.46} 46%|█████████████████████████████████████▍ | 235/509 [37:06<42:12, 9.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▍ | 235/509 [37:06<42:12, 9.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▍ | 235/509 [37:06<42:12, 9.24s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▌ | 236/509 [37:15<41:24, 9.10s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▌ | 236/509 [37:15<41:24, 9.10s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▌ | 236/509 [37:15<41:24, 9.10s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▌ | 236/509 [37:15<41:24, 9.10s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▌ | 236/509 [37:15<41:24, 9.10s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▋ | 237/509 [37:24<40:33, 8.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▋ | 237/509 [37:24<40:33, 8.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▋ | 237/509 [37:24<40:33, 8.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▋ | 237/509 [37:24<40:33, 8.95s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 238/509 [37:32<39:35, 8.77s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 238/509 [37:32<39:35, 8.77s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0226, 'learning_rate': 4.7e-06, 'epoch': 0.47} 47%|█████████████████████████████████████▊ | 238/509 [37:32<39:35, 8.77s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 238/509 [37:32<39:35, 8.77s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 238/509 [37:32<39:35, 8.77s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████ | 239/509 [37:40<38:26, 8.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████ | 239/509 [37:40<38:26, 8.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████ | 239/509 [37:40<38:26, 8.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████ | 239/509 [37:40<38:26, 8.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████ | 239/509 [37:40<38:26, 8.54s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████▏ | 240/509 [37:48<37:06, 8.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████▏ | 240/509 [37:48<37:06, 8.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████▏ | 240/509 [37:48<37:06, 8.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████▏ | 240/509 [37:48<37:06, 8.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████▏ | 240/509 [37:48<37:06, 8.28s/it]g-point operations will not be computed-28 23:44:41,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|██████████████████████████████████████▎ | 241/509 [37:55<35:38, 7.98s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:51:58,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:51:58,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▌ | 242/509 [38:02<34:01, 7.65s/it]g-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▌ | 242/509 [38:02<34:01, 7.65s/it]g-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:05,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:05,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▋ | 243/509 [38:08<32:15, 7.28s/it]g-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▋ | 243/509 [38:08<32:15, 7.28s/it]g-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:11,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:11,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▊ | 244/509 [38:14<30:29, 6.90s/it]g-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▊ | 244/509 [38:14<30:29, 6.90s/it]g-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:17,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:17,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:51:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▉ | 245/509 [38:20<28:26, 6.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▉ | 245/509 [38:20<28:26, 6.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:23,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:23,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:25,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:27,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:27,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:31,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:32,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:32,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:35,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:35,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:37,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:37,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:43,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:43,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:52:43,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████▉ | 251/509 [38:51<27:54, 6.49s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████▉ | 251/509 [38:51<27:54, 6.49s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████▉ | 251/509 [38:51<27:54, 6.49s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████▉ | 251/509 [38:51<27:54, 6.49s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████▉ | 251/509 [38:51<27:54, 6.49s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████ | 252/509 [39:03<34:31, 8.06s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████ | 252/509 [39:03<34:31, 8.06s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████ | 252/509 [39:03<34:31, 8.06s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████ | 252/509 [39:03<34:31, 8.06s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▎ | 253/509 [39:14<38:55, 9.12s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▎ | 253/509 [39:14<38:55, 9.12s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3221, 'learning_rate': 5e-06, 'epoch': 0.5} 50%|████████████████████████████████████████▎ | 253/509 [39:14<38:55, 9.12s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▎ | 253/509 [39:14<38:55, 9.12s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▎ | 253/509 [39:14<38:55, 9.12s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▍ | 254/509 [39:26<41:40, 9.81s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▍ | 254/509 [39:26<41:40, 9.81s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▍ | 254/509 [39:26<41:40, 9.81s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▍ | 254/509 [39:26<41:40, 9.81s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2932, 'learning_rate': 5.04e-06, 'epoch': 0.5} g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▋ | 256/509 [39:48<44:44, 10.61s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▋ | 256/509 [39:48<44:44, 10.61s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2929, 'learning_rate': 5.060000000000001e-06, 'epoch': 0.5} 50%|████████████████████████████████████████▋ | 256/509 [39:48<44:44, 10.61s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▋ | 256/509 [39:48<44:44, 10.61s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▉ | 257/509 [40:00<45:26, 10.82s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▉ | 257/509 [40:00<45:26, 10.82s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2025, 'learning_rate': 5.0800000000000005e-06, 'epoch': 0.5} 50%|████████████████████████████████████████▉ | 257/509 [40:00<45:26, 10.82s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 50%|████████████████████████████████████████▉ | 257/509 [40:00<45:26, 10.82s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████ | 258/509 [40:11<45:48, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████ | 258/509 [40:11<45:48, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3145, 'learning_rate': 5.1e-06, 'epoch': 0.51} 51%|█████████████████████████████████████████ | 258/509 [40:11<45:48, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████ | 258/509 [40:11<45:48, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2676, 'learning_rate': 5.12e-06, 'epoch': 0.51} g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▍ | 260/509 [40:33<45:42, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▍ | 260/509 [40:33<45:42, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▍ | 260/509 [40:33<45:42, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▍ | 260/509 [40:33<45:42, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▌ | 261/509 [40:44<45:25, 10.99s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▌ | 261/509 [40:44<45:25, 10.99s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1728, 'learning_rate': 5.1600000000000006e-06, 'epoch': 0.51} 51%|█████████████████████████████████████████▌ | 261/509 [40:44<45:25, 10.99s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▌ | 261/509 [40:44<45:25, 10.99s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▋ | 262/509 [40:55<45:18, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▋ | 262/509 [40:55<45:18, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1992, 'learning_rate': 5.18e-06, 'epoch': 0.51} 51%|█████████████████████████████████████████▋ | 262/509 [40:55<45:18, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 51%|█████████████████████████████████████████▋ | 262/509 [40:55<45:18, 11.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|█████████████████████████████████████████▊ | 263/509 [41:06<44:57, 10.97s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|█████████████████████████████████████████▊ | 263/509 [41:06<44:57, 10.97s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2509, 'learning_rate': 5.2e-06, 'epoch': 0.52} 52%|█████████████████████████████████████████▊ | 263/509 [41:06<44:57, 10.97s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|█████████████████████████████████████████▊ | 263/509 [41:06<44:57, 10.97s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|█████████████████████████████████████████▊ | 263/509 [41:06<44:57, 10.97s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████ | 264/509 [41:17<44:42, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████ | 264/509 [41:17<44:42, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████ | 264/509 [41:17<44:42, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████ | 264/509 [41:17<44:42, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████ | 264/509 [41:17<44:42, 10.95s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▏ | 265/509 [41:28<44:23, 10.92s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▏ | 265/509 [41:28<44:23, 10.92s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▏ | 265/509 [41:28<44:23, 10.92s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▏ | 265/509 [41:28<44:23, 10.92s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▎ | 266/509 [41:39<44:04, 10.88s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▎ | 266/509 [41:39<44:04, 10.88s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2429, 'learning_rate': 5.2600000000000005e-06, 'epoch': 0.52} 52%|██████████████████████████████████████████▎ | 266/509 [41:39<44:04, 10.88s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▎ | 266/509 [41:39<44:04, 10.88s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▍ | 267/509 [41:49<43:30, 10.79s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▍ | 267/509 [41:49<43:30, 10.79s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.287, 'learning_rate': 5.28e-06, 'epoch': 0.52} 52%|██████████████████████████████████████████▍ | 267/509 [41:49<43:30, 10.79s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 52%|██████████████████████████████████████████▍ | 267/509 [41:49<43:30, 10.79s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▋ | 268/509 [42:00<43:07, 10.74s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▋ | 268/509 [42:00<43:07, 10.74s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1914, 'learning_rate': 5.300000000000001e-06, 'epoch': 0.53} 53%|██████████████████████████████████████████▋ | 268/509 [42:00<43:07, 10.74s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▋ | 268/509 [42:00<43:07, 10.74s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▊ | 269/509 [42:10<42:31, 10.63s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▊ | 269/509 [42:10<42:31, 10.63s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2218, 'learning_rate': 5.320000000000001e-06, 'epoch': 0.53} 53%|██████████████████████████████████████████▊ | 269/509 [42:10<42:31, 10.63s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▊ | 269/509 [42:10<42:31, 10.63s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▉ | 270/509 [42:21<42:08, 10.58s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▉ | 270/509 [42:21<42:08, 10.58s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1518, 'learning_rate': 5.3400000000000005e-06, 'epoch': 0.53} 53%|██████████████████████████████████████████▉ | 270/509 [42:21<42:08, 10.58s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|██████████████████████████████████████████▉ | 270/509 [42:21<42:08, 10.58s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|███████████████████████████████████████████▏ | 271/509 [42:31<41:40, 10.51s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|███████████████████████████████████████████▏ | 271/509 [42:31<41:40, 10.51s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2039, 'learning_rate': 5.36e-06, 'epoch': 0.53} 53%|███████████████████████████████████████████▏ | 271/509 [42:31<41:40, 10.51s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|███████████████████████████████████████████▏ | 271/509 [42:31<41:40, 10.51s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|███████████████████████████████████████████▎ | 272/509 [42:41<41:06, 10.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|███████████████████████████████████████████▎ | 272/509 [42:41<41:06, 10.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1447, 'learning_rate': 5.380000000000001e-06, 'epoch': 0.53} 53%|███████████████████████████████████████████▎ | 272/509 [42:41<41:06, 10.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 53%|███████████████████████████████████████████▎ | 272/509 [42:41<41:06, 10.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▍ | 273/509 [42:51<40:34, 10.31s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▍ | 273/509 [42:51<40:34, 10.31s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2428, 'learning_rate': 5.400000000000001e-06, 'epoch': 0.54} 54%|███████████████████████████████████████████▍ | 273/509 [42:51<40:34, 10.31s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▍ | 273/509 [42:51<40:34, 10.31s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▌ | 274/509 [43:01<40:09, 10.25s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▌ | 274/509 [43:01<40:09, 10.25s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2338, 'learning_rate': 5.420000000000001e-06, 'epoch': 0.54} 54%|███████████████████████████████████████████▌ | 274/509 [43:01<40:09, 10.25s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▌ | 274/509 [43:01<40:09, 10.25s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▌ | 274/509 [43:01<40:09, 10.25s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▊ | 275/509 [43:12<40:27, 10.37s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▊ | 275/509 [43:12<40:27, 10.37s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▊ | 275/509 [43:12<40:27, 10.37s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▊ | 275/509 [43:12<40:27, 10.37s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▊ | 275/509 [43:12<40:27, 10.37s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▉ | 276/509 [43:22<39:52, 10.27s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▉ | 276/509 [43:22<39:52, 10.27s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▉ | 276/509 [43:22<39:52, 10.27s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▉ | 276/509 [43:22<39:52, 10.27s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|███████████████████████████████████████████▉ | 276/509 [43:22<39:52, 10.27s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|████████████████████████████████████████████ | 277/509 [43:32<39:20, 10.17s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|████████████████████████████████████████████ | 277/509 [43:32<39:20, 10.17s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|████████████████████████████████████████████ | 277/509 [43:32<39:20, 10.17s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|████████████████████████████████████████████ | 277/509 [43:32<39:20, 10.17s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 54%|████████████████████████████████████████████ | 277/509 [43:32<39:20, 10.17s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▏ | 278/509 [43:42<38:42, 10.05s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▏ | 278/509 [43:42<38:42, 10.05s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▏ | 278/509 [43:42<38:42, 10.05s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▏ | 278/509 [43:42<38:42, 10.05s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▍ | 279/509 [43:51<37:59, 9.91s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▍ | 279/509 [43:51<37:59, 9.91s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0988, 'learning_rate': 5.5200000000000005e-06, 'epoch': 0.55} 55%|████████████████████████████████████████████▍ | 279/509 [43:51<37:59, 9.91s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▍ | 279/509 [43:51<37:59, 9.91s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▍ | 279/509 [43:51<37:59, 9.91s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▌ | 280/509 [44:01<37:31, 9.83s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▌ | 280/509 [44:01<37:31, 9.83s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▌ | 280/509 [44:01<37:31, 9.83s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▌ | 280/509 [44:01<37:31, 9.83s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▌ | 280/509 [44:01<37:31, 9.83s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▋ | 281/509 [44:11<37:05, 9.76s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▋ | 281/509 [44:11<37:05, 9.76s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▋ | 281/509 [44:11<37:05, 9.76s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▋ | 281/509 [44:11<37:05, 9.76s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▉ | 282/509 [44:20<36:34, 9.67s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▉ | 282/509 [44:20<36:34, 9.67s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1259, 'learning_rate': 5.580000000000001e-06, 'epoch': 0.55} 55%|████████████████████████████████████████████▉ | 282/509 [44:20<36:34, 9.67s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▉ | 282/509 [44:20<36:34, 9.67s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▉ | 282/509 [44:20<36:34, 9.67s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 55%|████████████████████████████████████████████▉ | 282/509 [44:20<36:34, 9.67s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████ | 283/509 [44:29<35:59, 9.55s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████ | 283/509 [44:29<35:59, 9.55s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████ | 283/509 [44:29<35:59, 9.55s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████ | 283/509 [44:29<35:59, 9.55s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▏ | 284/509 [44:38<35:17, 9.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▏ | 284/509 [44:38<35:17, 9.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2853, 'learning_rate': 5.620000000000001e-06, 'epoch': 0.56} 56%|█████████████████████████████████████████████▏ | 284/509 [44:38<35:17, 9.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▏ | 284/509 [44:38<35:17, 9.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▏ | 284/509 [44:38<35:17, 9.41s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▎ | 285/509 [44:47<34:40, 9.29s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▎ | 285/509 [44:47<34:40, 9.29s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▎ | 285/509 [44:47<34:40, 9.29s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▎ | 285/509 [44:47<34:40, 9.29s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▌ | 286/509 [44:56<33:57, 9.14s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▌ | 286/509 [44:56<33:57, 9.14s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2353, 'learning_rate': 5.66e-06, 'epoch': 0.56} 56%|█████████████████████████████████████████████▌ | 286/509 [44:56<33:57, 9.14s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▌ | 286/509 [44:56<33:57, 9.14s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▋ | 287/509 [45:05<33:19, 9.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▋ | 287/509 [45:05<33:19, 9.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2637, 'learning_rate': 5.68e-06, 'epoch': 0.56} 56%|█████████████████████████████████████████████▋ | 287/509 [45:05<33:19, 9.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 56%|█████████████████████████████████████████████▋ | 287/509 [45:05<33:19, 9.01s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|█████████████████████████████████████████████▊ | 288/509 [45:13<32:21, 8.78s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|█████████████████████████████████████████████▊ | 288/509 [45:13<32:21, 8.78s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1921, 'learning_rate': 5.7e-06, 'epoch': 0.56} 57%|█████████████████████████████████████████████▊ | 288/509 [45:13<32:21, 8.78s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|█████████████████████████████████████████████▊ | 288/509 [45:13<32:21, 8.78s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|█████████████████████████████████████████████▉ | 289/509 [45:21<31:18, 8.54s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|█████████████████████████████████████████████▉ | 289/509 [45:21<31:18, 8.54s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.319, 'learning_rate': 5.72e-06, 'epoch': 0.57} 57%|█████████████████████████████████████████████▉ | 289/509 [45:21<31:18, 8.54s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|█████████████████████████████████████████████▉ | 289/509 [45:21<31:18, 8.54s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|██████████████████████████████████████████████▏ | 290/509 [45:29<29:53, 8.19s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|██████████████████████████████████████████████▏ | 290/509 [45:29<29:53, 8.19s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:30,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:30,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:30,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|██████████████████████████████████████████████▎ | 291/509 [45:36<28:28, 7.84s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|██████████████████████████████████████████████▎ | 291/509 [45:36<28:28, 7.84s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|██████████████████████████████████████████████▎ | 291/509 [45:36<28:28, 7.84s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 57%|██████████████████████████████████████████████▎ | 291/509 [45:36<28:28, 7.84s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:40,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:40,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:40,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:40,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:47,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:47,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:51,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:51,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 58%|██████████████████████████████████████████████▊ | 294/509 [45:54<23:31, 6.56s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:54,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:57,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:57,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:59:59,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:00:01,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:00:01,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1334, 'learning_rate': 5.86e-06, 'epoch': 0.58} [WARNING|modeling_utils.py:388] 2022-03-01 00:00:01,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 58%|███████████████████████████████████████████████▎ | 297/509 [46:07<17:46, 5.03s/it]g-point operations will not be computed-28 23:52:19,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 58%|███████████████████████████████████████████████▎ | 297/509 [46:07<17:46, 5.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:00:06,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 58%|███████████████████████████████████████████████▎ | 297/509 [46:07<17:46, 5.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:00:06,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▍ | 298/509 [46:10<15:57, 4.54s/it]g-point operations will not be computed-01 00:00:06,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:00:11,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:09,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:00:11,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:09,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▌ | 299/509 [46:13<14:14, 4.07s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:00:12,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▌ | 299/509 [46:13<14:14, 4.07s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:00:12,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▋ | 300/509 [46:16<13:06, 3.76s/it]g-point operations will not be computed-01 00:00:12,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▋ | 300/509 [46:16<13:06, 3.76s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▋ | 300/509 [46:16<13:06, 3.76s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:00:24,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:00:24,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2167, 'learning_rate': 5.98e-06, 'epoch': 0.59} 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 59%|███████████████████████████████████████████████▉ | 301/509 [46:28<21:41, 6.26s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▏ | 303/509 [46:51<30:55, 9.01s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▏ | 303/509 [46:51<30:55, 9.01s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2554, 'learning_rate': 6e-06, 'epoch': 0.59} 60%|████████████████████████████████████████████████▏ | 303/509 [46:51<30:55, 9.01s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▏ | 303/509 [46:51<30:55, 9.01s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▏ | 303/509 [46:51<30:55, 9.01s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▍ | 304/509 [47:03<33:12, 9.72s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▍ | 304/509 [47:03<33:12, 9.72s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▍ | 304/509 [47:03<33:12, 9.72s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▍ | 304/509 [47:03<33:12, 9.72s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▌ | 305/509 [47:14<34:45, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▌ | 305/509 [47:14<34:45, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1688, 'learning_rate': 6.040000000000001e-06, 'epoch': 0.6} 60%|████████████████████████████████████████████████▌ | 305/509 [47:14<34:45, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▌ | 305/509 [47:14<34:45, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▋ | 306/509 [47:25<35:42, 10.55s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▋ | 306/509 [47:25<35:42, 10.55s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1835, 'learning_rate': 6.0600000000000004e-06, 'epoch': 0.6} 60%|████████████████████████████████████████████████▋ | 306/509 [47:25<35:42, 10.55s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▋ | 306/509 [47:25<35:42, 10.55s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▊ | 307/509 [47:37<36:10, 10.74s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▊ | 307/509 [47:37<36:10, 10.74s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2019, 'learning_rate': 6.08e-06, 'epoch': 0.6} 60%|████████████████████████████████████████████████▊ | 307/509 [47:37<36:10, 10.74s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 60%|████████████████████████████████████████████████▊ | 307/509 [47:37<36:10, 10.74s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████ | 308/509 [47:48<36:21, 10.85s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████ | 308/509 [47:48<36:21, 10.85s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.239, 'learning_rate': 6.1e-06, 'epoch': 0.6} 61%|█████████████████████████████████████████████████ | 308/509 [47:48<36:21, 10.85s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████ | 308/509 [47:48<36:21, 10.85s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▏ | 309/509 [47:59<36:25, 10.93s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▏ | 309/509 [47:59<36:25, 10.93s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1771, 'learning_rate': 6.120000000000001e-06, 'epoch': 0.61} 61%|█████████████████████████████████████████████████▏ | 309/509 [47:59<36:25, 10.93s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▏ | 309/509 [47:59<36:25, 10.93s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▏ | 309/509 [47:59<36:25, 10.93s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▎ | 310/509 [48:10<36:18, 10.95s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▎ | 310/509 [48:10<36:18, 10.95s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▎ | 310/509 [48:10<36:18, 10.95s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▎ | 310/509 [48:10<36:18, 10.95s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▎ | 310/509 [48:10<36:18, 10.95s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▍ | 311/509 [48:21<36:12, 10.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▍ | 311/509 [48:21<36:12, 10.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▍ | 311/509 [48:21<36:12, 10.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▍ | 311/509 [48:21<36:12, 10.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▍ | 311/509 [48:21<36:12, 10.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▋ | 312/509 [48:32<35:51, 10.92s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▋ | 312/509 [48:32<35:51, 10.92s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▋ | 312/509 [48:32<35:51, 10.92s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▋ | 312/509 [48:32<35:51, 10.92s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▊ | 313/509 [48:42<35:31, 10.87s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▊ | 313/509 [48:42<35:31, 10.87s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1747, 'learning_rate': 6.200000000000001e-06, 'epoch': 0.61} 61%|█████████████████████████████████████████████████▊ | 313/509 [48:42<35:31, 10.87s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 61%|█████████████████████████████████████████████████▊ | 313/509 [48:42<35:31, 10.87s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|█████████████████████████████████████████████████▉ | 314/509 [48:53<35:12, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|█████████████████████████████████████████████████▉ | 314/509 [48:53<35:12, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1698, 'learning_rate': 6.220000000000001e-06, 'epoch': 0.62} 62%|█████████████████████████████████████████████████▉ | 314/509 [48:53<35:12, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|█████████████████████████████████████████████████▉ | 314/509 [48:53<35:12, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▏ | 315/509 [49:04<35:00, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▏ | 315/509 [49:04<35:00, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2586, 'learning_rate': 6.24e-06, 'epoch': 0.62} 62%|██████████████████████████████████████████████████▏ | 315/509 [49:04<35:00, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▏ | 315/509 [49:04<35:00, 10.83s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1145, 'learning_rate': 6.26e-06, 'epoch': 0.62} g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▍ | 317/509 [49:25<34:20, 10.73s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▍ | 317/509 [49:25<34:20, 10.73s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▍ | 317/509 [49:25<34:20, 10.73s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▍ | 317/509 [49:25<34:20, 10.73s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▍ | 317/509 [49:25<34:20, 10.73s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▌ | 318/509 [49:36<34:00, 10.69s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▌ | 318/509 [49:36<34:00, 10.69s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▌ | 318/509 [49:36<34:00, 10.69s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▌ | 318/509 [49:36<34:00, 10.69s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 62%|██████████████████████████████████████████████████▌ | 318/509 [49:36<34:00, 10.69s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▊ | 319/509 [49:46<33:43, 10.65s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▊ | 319/509 [49:46<33:43, 10.65s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▊ | 319/509 [49:46<33:43, 10.65s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▊ | 319/509 [49:46<33:43, 10.65s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▊ | 319/509 [49:46<33:43, 10.65s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▉ | 320/509 [49:57<33:15, 10.56s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▉ | 320/509 [49:57<33:15, 10.56s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▉ | 320/509 [49:57<33:15, 10.56s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▉ | 320/509 [49:57<33:15, 10.56s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|██████████████████████████████████████████████████▉ | 320/509 [49:57<33:15, 10.56s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████ | 321/509 [50:07<32:53, 10.50s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████ | 321/509 [50:07<32:53, 10.50s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████ | 321/509 [50:07<32:53, 10.50s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████ | 321/509 [50:07<32:53, 10.50s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████ | 321/509 [50:07<32:53, 10.50s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▏ | 322/509 [50:17<32:30, 10.43s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▏ | 322/509 [50:17<32:30, 10.43s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▏ | 322/509 [50:17<32:30, 10.43s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▏ | 322/509 [50:17<32:30, 10.43s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▏ | 322/509 [50:17<32:30, 10.43s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▍ | 323/509 [50:27<31:59, 10.32s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▍ | 323/509 [50:27<31:59, 10.32s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▍ | 323/509 [50:27<31:59, 10.32s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▍ | 323/509 [50:27<31:59, 10.32s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 63%|███████████████████████████████████████████████████▍ | 323/509 [50:27<31:59, 10.32s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▌ | 324/509 [50:38<31:34, 10.24s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▌ | 324/509 [50:38<31:34, 10.24s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▌ | 324/509 [50:38<31:34, 10.24s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▌ | 324/509 [50:38<31:34, 10.24s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▌ | 324/509 [50:38<31:34, 10.24s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▋ | 325/509 [50:48<31:43, 10.34s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▋ | 325/509 [50:48<31:43, 10.34s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▋ | 325/509 [50:48<31:43, 10.34s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▋ | 325/509 [50:48<31:43, 10.34s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▋ | 325/509 [50:48<31:43, 10.34s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▉ | 326/509 [50:58<31:10, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▉ | 326/509 [50:58<31:10, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▉ | 326/509 [50:58<31:10, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|███████████████████████████████████████████████████▉ | 326/509 [50:58<31:10, 10.22s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|████████████████████████████████████████████████████ | 327/509 [51:08<30:39, 10.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|████████████████████████████████████████████████████ | 327/509 [51:08<30:39, 10.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1815, 'learning_rate': 6.480000000000001e-06, 'epoch': 0.64} 64%|████████████████████████████████████████████████████ | 327/509 [51:08<30:39, 10.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|████████████████████████████████████████████████████ | 327/509 [51:08<30:39, 10.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|████████████████████████████████████████████████████▏ | 328/509 [51:18<30:04, 9.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|████████████████████████████████████████████████████▏ | 328/509 [51:18<30:04, 9.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3103, 'learning_rate': 6.5000000000000004e-06, 'epoch': 0.64} 64%|████████████████████████████████████████████████████▏ | 328/509 [51:18<30:04, 9.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 64%|████████████████████████████████████████████████████▏ | 328/509 [51:18<30:04, 9.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▎ | 329/509 [51:27<29:39, 9.88s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▎ | 329/509 [51:27<29:39, 9.88s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1042, 'learning_rate': 6.520000000000001e-06, 'epoch': 0.65} 65%|████████████████████████████████████████████████████▎ | 329/509 [51:27<29:39, 9.88s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▎ | 329/509 [51:27<29:39, 9.88s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▌ | 330/509 [51:37<29:13, 9.79s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▌ | 330/509 [51:37<29:13, 9.79s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3816, 'learning_rate': 6.540000000000001e-06, 'epoch': 0.65} 65%|████████████████████████████████████████████████████▌ | 330/509 [51:37<29:13, 9.79s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▌ | 330/509 [51:37<29:13, 9.79s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▋ | 331/509 [51:46<28:47, 9.71s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▋ | 331/509 [51:46<28:47, 9.71s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2755, 'learning_rate': 6.560000000000001e-06, 'epoch': 0.65} 65%|████████████████████████████████████████████████████▋ | 331/509 [51:46<28:47, 9.71s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▋ | 331/509 [51:46<28:47, 9.71s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▊ | 332/509 [51:56<28:21, 9.61s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▊ | 332/509 [51:56<28:21, 9.61s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1153, 'learning_rate': 6.5800000000000005e-06, 'epoch': 0.65} 65%|████████████████████████████████████████████████████▊ | 332/509 [51:56<28:21, 9.61s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▊ | 332/509 [51:56<28:21, 9.61s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▉ | 333/509 [52:05<27:49, 9.49s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▉ | 333/509 [52:05<27:49, 9.49s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0359, 'learning_rate': 6.600000000000001e-06, 'epoch': 0.65} 65%|████████████████████████████████████████████████████▉ | 333/509 [52:05<27:49, 9.49s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 65%|████████████████████████████████████████████████████▉ | 333/509 [52:05<27:49, 9.49s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2784, 'learning_rate': 6.620000000000001e-06, 'epoch': 0.66} g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▎ | 335/509 [52:23<26:42, 9.21s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▎ | 335/509 [52:23<26:42, 9.21s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▎ | 335/509 [52:23<26:42, 9.21s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▎ | 335/509 [52:23<26:42, 9.21s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▎ | 335/509 [52:23<26:42, 9.21s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▍ | 336/509 [52:32<26:14, 9.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▍ | 336/509 [52:32<26:14, 9.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▍ | 336/509 [52:32<26:14, 9.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▍ | 336/509 [52:32<26:14, 9.10s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▋ | 337/509 [52:40<25:42, 8.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▋ | 337/509 [52:40<25:42, 8.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2839, 'learning_rate': 6.680000000000001e-06, 'epoch': 0.66} 66%|█████████████████████████████████████████████████████▋ | 337/509 [52:40<25:42, 8.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▋ | 337/509 [52:40<25:42, 8.97s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▊ | 338/509 [52:49<25:05, 8.80s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▊ | 338/509 [52:49<25:05, 8.80s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3273, 'learning_rate': 6.700000000000001e-06, 'epoch': 0.66} 66%|█████████████████████████████████████████████████████▊ | 338/509 [52:49<25:05, 8.80s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 66%|█████████████████████████████████████████████████████▊ | 338/509 [52:49<25:05, 8.80s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|█████████████████████████████████████████████████████▉ | 339/509 [52:57<24:18, 8.58s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|█████████████████████████████████████████████████████▉ | 339/509 [52:57<24:18, 8.58s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1638, 'learning_rate': 6.720000000000001e-06, 'epoch': 0.67} 67%|█████████████████████████████████████████████████████▉ | 339/509 [52:57<24:18, 8.58s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|█████████████████████████████████████████████████████▉ | 339/509 [52:57<24:18, 8.58s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████ | 340/509 [53:05<23:28, 8.33s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████ | 340/509 [53:05<23:28, 8.33s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2066, 'learning_rate': 6.740000000000001e-06, 'epoch': 0.67} 67%|██████████████████████████████████████████████████████ | 340/509 [53:05<23:28, 8.33s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████ | 340/509 [53:05<23:28, 8.33s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████ | 340/509 [53:05<23:28, 8.33s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████▎ | 341/509 [53:12<22:27, 8.02s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:14,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:14,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████▍ | 342/509 [53:19<21:21, 7.68s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████▍ | 342/509 [53:19<21:21, 7.68s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3601, 'learning_rate': 6.780000000000001e-06, 'epoch': 0.67} [WARNING|modeling_utils.py:388] 2022-03-01 00:07:22,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████▌ | 343/509 [53:25<20:08, 7.28s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 67%|██████████████████████████████████████████████████████▌ | 343/509 [53:25<20:08, 7.28s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.5382, 'learning_rate': 6.800000000000001e-06, 'epoch': 0.67} [WARNING|modeling_utils.py:388] 2022-03-01 00:07:28,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 68%|██████████████████████████████████████████████████████▋ | 344/509 [53:31<18:51, 6.86s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 68%|██████████████████████████████████████████████████████▋ | 344/509 [53:31<18:51, 6.86s/it]g-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:32,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:34,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:34,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.301, 'learning_rate': 6.8400000000000014e-06, 'epoch': 0.68} [WARNING|modeling_utils.py:388] 2022-03-01 00:07:38,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:38,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:00:18,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 68%|███████████████████████████████████████████████████████ | 346/509 [53:41<15:48, 5.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:40,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:42,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:40,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:42,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:40,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 68%|███████████████████████████████████████████████████████▏ | 347/509 [53:45<14:19, 5.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:44,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:46,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:44,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:46,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:44,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 68%|███████████████████████████████████████████████████████▍ | 348/509 [53:48<12:54, 4.81s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:48,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▌ | 349/509 [53:52<11:27, 4.30s/it]g-point operations will not be computed-01 00:07:48,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▌ | 349/509 [53:52<11:27, 4.30s/it]g-point operations will not be computed-01 00:07:48,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:07:52,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:51,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▋ | 350/509 [53:55<10:27, 3.95s/it]g-point operations will not be computed-01 00:07:51,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▋ | 350/509 [53:55<10:27, 3.95s/it]g-point operations will not be computed-01 00:07:51,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▋ | 350/509 [53:55<10:27, 3.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▋ | 350/509 [53:55<10:27, 3.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▋ | 350/509 [53:55<10:27, 3.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▋ | 350/509 [53:55<10:27, 3.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▊ | 351/509 [54:07<17:10, 6.52s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▊ | 351/509 [54:07<17:10, 6.52s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▊ | 351/509 [54:07<17:10, 6.52s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▊ | 351/509 [54:07<17:10, 6.52s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|███████████████████████████████████████████████████████▊ | 351/509 [54:07<17:10, 6.52s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████ | 352/509 [54:19<21:08, 8.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████ | 352/509 [54:19<21:08, 8.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████ | 352/509 [54:19<21:08, 8.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████ | 352/509 [54:19<21:08, 8.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████▏ | 353/509 [54:31<23:46, 9.14s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████▏ | 353/509 [54:31<23:46, 9.14s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2144, 'learning_rate': 7e-06, 'epoch': 0.69} 69%|████████████████████████████████████████████████████████▏ | 353/509 [54:31<23:46, 9.14s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 69%|████████████████████████████████████████████████████████▏ | 353/509 [54:31<23:46, 9.14s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▎ | 354/509 [54:42<25:26, 9.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▎ | 354/509 [54:42<25:26, 9.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2597, 'learning_rate': 7.0200000000000006e-06, 'epoch': 0.69} 70%|████████████████████████████████████████████████████████▎ | 354/509 [54:42<25:26, 9.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▎ | 354/509 [54:42<25:26, 9.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▍ | 355/509 [54:53<26:26, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▍ | 355/509 [54:53<26:26, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2307, 'learning_rate': 7.04e-06, 'epoch': 0.7} 70%|████████████████████████████████████████████████████████▍ | 355/509 [54:53<26:26, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▍ | 355/509 [54:53<26:26, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▋ | 356/509 [55:05<27:01, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▋ | 356/509 [55:05<27:01, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2258, 'learning_rate': 7.06e-06, 'epoch': 0.7} 70%|████████████████████████████████████████████████████████▋ | 356/509 [55:05<27:01, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▋ | 356/509 [55:05<27:01, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▋ | 356/509 [55:05<27:01, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▊ | 357/509 [55:16<27:24, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▊ | 357/509 [55:16<27:24, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▊ | 357/509 [55:16<27:24, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▊ | 357/509 [55:16<27:24, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▉ | 358/509 [55:27<27:31, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▉ | 358/509 [55:27<27:31, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1364, 'learning_rate': 7.100000000000001e-06, 'epoch': 0.7} 70%|████████████████████████████████████████████████████████▉ | 358/509 [55:27<27:31, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▉ | 358/509 [55:27<27:31, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 70%|████████████████████████████████████████████████████████▉ | 358/509 [55:27<27:31, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▏ | 359/509 [55:38<27:30, 11.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▏ | 359/509 [55:38<27:30, 11.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▏ | 359/509 [55:38<27:30, 11.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▏ | 359/509 [55:38<27:30, 11.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▎ | 360/509 [55:50<27:28, 11.06s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▎ | 360/509 [55:50<27:28, 11.06s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3119, 'learning_rate': 7.14e-06, 'epoch': 0.71} 71%|█████████████████████████████████████████████████████████▎ | 360/509 [55:50<27:28, 11.06s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▎ | 360/509 [55:50<27:28, 11.06s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▎ | 360/509 [55:50<27:28, 11.06s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▍ | 361/509 [56:01<27:12, 11.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▍ | 361/509 [56:01<27:12, 11.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▍ | 361/509 [56:01<27:12, 11.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▍ | 361/509 [56:01<27:12, 11.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▍ | 361/509 [56:01<27:12, 11.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▌ | 362/509 [56:11<26:56, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▌ | 362/509 [56:11<26:56, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▌ | 362/509 [56:11<26:56, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▌ | 362/509 [56:11<26:56, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▊ | 363/509 [56:22<26:37, 10.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▊ | 363/509 [56:22<26:37, 10.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2059, 'learning_rate': 7.2000000000000005e-06, 'epoch': 0.71} 71%|█████████████████████████████████████████████████████████▊ | 363/509 [56:22<26:37, 10.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 71%|█████████████████████████████████████████████████████████▊ | 363/509 [56:22<26:37, 10.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1819, 'learning_rate': 7.22e-06, 'epoch': 0.71} 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2137, 'learning_rate': 7.24e-06, 'epoch': 0.72} 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|█████████████████████████████████████████████████████████▉ | 364/509 [56:33<26:12, 10.84s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▏ | 366/509 [56:54<25:32, 10.71s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▏ | 366/509 [56:54<25:32, 10.71s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▏ | 366/509 [56:54<25:32, 10.71s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▏ | 366/509 [56:54<25:32, 10.71s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▏ | 366/509 [56:54<25:32, 10.71s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▍ | 367/509 [57:05<25:13, 10.66s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▍ | 367/509 [57:05<25:13, 10.66s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▍ | 367/509 [57:05<25:13, 10.66s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▍ | 367/509 [57:05<25:13, 10.66s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▌ | 368/509 [57:15<24:54, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▌ | 368/509 [57:15<24:54, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2354, 'learning_rate': 7.3e-06, 'epoch': 0.72} 72%|██████████████████████████████████████████████████████████▌ | 368/509 [57:15<24:54, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▌ | 368/509 [57:15<24:54, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▌ | 368/509 [57:15<24:54, 10.60s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▋ | 369/509 [57:25<24:34, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▋ | 369/509 [57:25<24:34, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▋ | 369/509 [57:25<24:34, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▋ | 369/509 [57:25<24:34, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 72%|██████████████████████████████████████████████████████████▋ | 369/509 [57:25<24:34, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|██████████████████████████████████████████████████████████▉ | 370/509 [57:36<24:17, 10.49s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|██████████████████████████████████████████████████████████▉ | 370/509 [57:36<24:17, 10.49s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|██████████████████████████████████████████████████████████▉ | 370/509 [57:36<24:17, 10.49s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|██████████████████████████████████████████████████████████▉ | 370/509 [57:36<24:17, 10.49s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|██████████████████████████████████████████████████████████▉ | 370/509 [57:36<24:17, 10.49s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████ | 371/509 [57:46<24:01, 10.45s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████ | 371/509 [57:46<24:01, 10.45s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████ | 371/509 [57:46<24:01, 10.45s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████ | 371/509 [57:46<24:01, 10.45s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████ | 371/509 [57:46<24:01, 10.45s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▏ | 372/509 [57:56<23:43, 10.39s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▏ | 372/509 [57:56<23:43, 10.39s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▏ | 372/509 [57:56<23:43, 10.39s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▏ | 372/509 [57:56<23:43, 10.39s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▏ | 372/509 [57:56<23:43, 10.39s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▎ | 373/509 [58:07<23:25, 10.33s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▎ | 373/509 [58:07<23:25, 10.33s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▎ | 373/509 [58:07<23:25, 10.33s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▎ | 373/509 [58:07<23:25, 10.33s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▎ | 373/509 [58:07<23:25, 10.33s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▌ | 374/509 [58:17<23:06, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▌ | 374/509 [58:17<23:06, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▌ | 374/509 [58:17<23:06, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▌ | 374/509 [58:17<23:06, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 73%|███████████████████████████████████████████████████████████▌ | 374/509 [58:17<23:06, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▋ | 375/509 [58:27<23:05, 10.34s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▋ | 375/509 [58:27<23:05, 10.34s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▋ | 375/509 [58:27<23:05, 10.34s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▋ | 375/509 [58:27<23:05, 10.34s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▋ | 375/509 [58:27<23:05, 10.34s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▊ | 376/509 [58:37<22:38, 10.22s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▊ | 376/509 [58:37<22:38, 10.22s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▊ | 376/509 [58:37<22:38, 10.22s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▊ | 376/509 [58:37<22:38, 10.22s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▊ | 376/509 [58:37<22:38, 10.22s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▉ | 377/509 [58:47<22:15, 10.12s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▉ | 377/509 [58:47<22:15, 10.12s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▉ | 377/509 [58:47<22:15, 10.12s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|███████████████████████████████████████████████████████████▉ | 377/509 [58:47<22:15, 10.12s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|████████████████████████████████████████████████████████████▏ | 378/509 [58:57<21:50, 10.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|████████████████████████████████████████████████████████████▏ | 378/509 [58:57<21:50, 10.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1554, 'learning_rate': 7.500000000000001e-06, 'epoch': 0.74} 74%|████████████████████████████████████████████████████████████▏ | 378/509 [58:57<21:50, 10.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|████████████████████████████████████████████████████████████▏ | 378/509 [58:57<21:50, 10.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|████████████████████████████████████████████████████████████▎ | 379/509 [59:06<21:25, 9.89s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 74%|████████████████████████████████████████████████████████████▎ | 379/509 [59:06<21:25, 9.89s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1464, 'learning_rate': 7.520000000000001e-06, 'epoch': 0.74} 74%|████████████████████████████████████████████████████████████▎ | 379/509 [59:06<21:25, 9.89s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:13:14,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:13:14,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1956, 'learning_rate': 7.540000000000001e-06, 'epoch': 0.75} [WARNING|modeling_utils.py:388] 2022-03-01 00:13:14,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:13:14,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:13:14,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▋ | 381/509 [59:25<20:41, 9.70s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▋ | 381/509 [59:25<20:41, 9.70s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2492, 'learning_rate': 7.5600000000000005e-06, 'epoch': 0.75} 75%|████████████████████████████████████████████████████████████▋ | 381/509 [59:25<20:41, 9.70s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▋ | 381/509 [59:25<20:41, 9.70s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.17, 'learning_rate': 7.58e-06, 'epoch': 0.75} 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2869, 'learning_rate': 7.600000000000001e-06, 'epoch': 0.75} 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|████████████████████████████████████████████████████████████▊ | 382/509 [59:35<20:26, 9.66s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|█████████████████████████████████████████████████████████████ | 384/509 [59:54<19:39, 9.43s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|█████████████████████████████████████████████████████████████ | 384/509 [59:54<19:39, 9.43s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|█████████████████████████████████████████████████████████████ | 384/509 [59:54<19:39, 9.43s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|█████████████████████████████████████████████████████████████ | 384/509 [59:54<19:39, 9.43s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 75%|█████████████████████████████████████████████████████████████ | 384/509 [59:54<19:39, 9.43s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|███████████████████████████████████████████████████████████▊ | 385/509 [1:00:03<19:14, 9.31s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|███████████████████████████████████████████████████████████▊ | 385/509 [1:00:03<19:14, 9.31s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|███████████████████████████████████████████████████████████▊ | 385/509 [1:00:03<19:14, 9.31s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|███████████████████████████████████████████████████████████▊ | 385/509 [1:00:03<19:14, 9.31s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|███████████████████████████████████████████████████████████▉ | 386/509 [1:00:11<18:49, 9.18s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|███████████████████████████████████████████████████████████▉ | 386/509 [1:00:11<18:49, 9.18s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.134, 'learning_rate': 7.660000000000001e-06, 'epoch': 0.76} 76%|███████████████████████████████████████████████████████████▉ | 386/509 [1:00:11<18:49, 9.18s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:18,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:18,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1781, 'learning_rate': 7.680000000000001e-06, 'epoch': 0.76} [WARNING|modeling_utils.py:388] 2022-03-01 00:14:18,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:18,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:18,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|████████████████████████████████████████████████████████████▏ | 388/509 [1:00:29<17:58, 8.92s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|████████████████████████████████████████████████████████████▏ | 388/509 [1:00:29<17:58, 8.92s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 76%|████████████████████████████████████████████████████████████▏ | 388/509 [1:00:29<17:58, 8.92s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:35,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:35,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2041, 'learning_rate': 7.72e-06, 'epoch': 0.76} [WARNING|modeling_utils.py:388] 2022-03-01 00:14:35,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:35,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▌ | 390/509 [1:00:45<16:49, 8.48s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▌ | 390/509 [1:00:45<16:49, 8.48s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2828, 'learning_rate': 7.74e-06, 'epoch': 0.77} 77%|████████████████████████████████████████████████████████████▌ | 390/509 [1:00:45<16:49, 8.48s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▌ | 390/509 [1:00:45<16:49, 8.48s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▋ | 391/509 [1:00:53<16:09, 8.22s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▋ | 391/509 [1:00:53<16:09, 8.22s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2383, 'learning_rate': 7.76e-06, 'epoch': 0.77} 77%|████████████████████████████████████████████████████████████▋ | 391/509 [1:00:53<16:09, 8.22s/it]g-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:58,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:58,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.375, 'learning_rate': 7.78e-06, 'epoch': 0.77} [WARNING|modeling_utils.py:388] 2022-03-01 00:14:58,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:14:58,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:07:56,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▉ | 393/509 [1:01:06<14:23, 7.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:06,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▉ | 393/509 [1:01:06<14:23, 7.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:06,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2837, 'learning_rate': 7.800000000000002e-06, 'epoch': 0.77} 77%|████████████████████████████████████████████████████████████▉ | 393/509 [1:01:06<14:23, 7.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:06,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|████████████████████████████████████████████████████████████▉ | 393/509 [1:01:06<14:23, 7.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:06,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|█████████████████████████████████████████████████████████████▏ | 394/509 [1:01:12<13:23, 6.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|█████████████████████████████████████████████████████████████▏ | 394/509 [1:01:12<13:23, 6.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 77%|█████████████████████████████████████████████████████████████▏ | 394/509 [1:01:12<13:23, 6.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:15:16,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:15:18,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:15:18,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:15:18,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:12,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▍ | 396/509 [1:01:22<11:19, 6.01s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:22,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:15:24,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:22,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:15:24,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:22,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▌ | 397/509 [1:01:27<10:17, 5.51s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:26,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▊ | 398/509 [1:01:30<09:14, 5.00s/it]g-point operations will not be computed-01 00:15:26,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▊ | 398/509 [1:01:30<09:14, 5.00s/it]g-point operations will not be computed-01 00:15:26,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▊ | 398/509 [1:01:30<09:14, 5.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:30,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▊ | 398/509 [1:01:30<09:14, 5.00s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:30,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 78%|█████████████████████████████████████████████████████████████▉ | 399/509 [1:01:33<08:08, 4.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:33,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████ | 400/509 [1:01:37<07:18, 4.02s/it]g-point operations will not be computed-01 00:15:33,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████ | 400/509 [1:01:37<07:18, 4.02s/it]g-point operations will not be computed-01 00:15:33,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.5583, 'learning_rate': 7.94e-06, 'epoch': 0.78} 79%|██████████████████████████████████████████████████████████████ | 400/509 [1:01:37<07:18, 4.02s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████ | 400/509 [1:01:37<07:18, 4.02s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████ | 400/509 [1:01:37<07:18, 4.02s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▏ | 401/509 [1:01:49<11:35, 6.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▏ | 401/509 [1:01:49<11:35, 6.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2798, 'learning_rate': 7.960000000000002e-06, 'epoch': 0.79} 79%|██████████████████████████████████████████████████████████████▏ | 401/509 [1:01:49<11:35, 6.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▏ | 401/509 [1:01:49<11:35, 6.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▍ | 402/509 [1:02:00<14:14, 7.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▍ | 402/509 [1:02:00<14:14, 7.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1779, 'learning_rate': 7.980000000000002e-06, 'epoch': 0.79} 79%|██████████████████████████████████████████████████████████████▍ | 402/509 [1:02:00<14:14, 7.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▍ | 402/509 [1:02:00<14:14, 7.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▍ | 402/509 [1:02:00<14:14, 7.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▌ | 403/509 [1:02:12<15:58, 9.05s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▌ | 403/509 [1:02:12<15:58, 9.05s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▌ | 403/509 [1:02:12<15:58, 9.05s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▌ | 403/509 [1:02:12<15:58, 9.05s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▋ | 404/509 [1:02:23<17:03, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▋ | 404/509 [1:02:23<17:03, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0382, 'learning_rate': 8.020000000000001e-06, 'epoch': 0.79} 79%|██████████████████████████████████████████████████████████████▋ | 404/509 [1:02:23<17:03, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▋ | 404/509 [1:02:23<17:03, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 79%|██████████████████████████████████████████████████████████████▋ | 404/509 [1:02:23<17:03, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|██████████████████████████████████████████████████████████████▊ | 405/509 [1:02:34<17:44, 10.23s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|██████████████████████████████████████████████████████████████▊ | 405/509 [1:02:34<17:44, 10.23s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|██████████████████████████████████████████████████████████████▊ | 405/509 [1:02:34<17:44, 10.23s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|██████████████████████████████████████████████████████████████▊ | 405/509 [1:02:34<17:44, 10.23s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1735, 'learning_rate': 8.06e-06, 'epoch': 0.8} 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1883, 'learning_rate': 8.08e-06, 'epoch': 0.8} 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████ | 406/509 [1:02:46<18:05, 10.53s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▎ | 408/509 [1:03:08<18:15, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▎ | 408/509 [1:03:08<18:15, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1494, 'learning_rate': 8.1e-06, 'epoch': 0.8} 80%|███████████████████████████████████████████████████████████████▎ | 408/509 [1:03:08<18:15, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▎ | 408/509 [1:03:08<18:15, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▍ | 409/509 [1:03:19<18:13, 10.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▍ | 409/509 [1:03:19<18:13, 10.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1361, 'learning_rate': 8.120000000000002e-06, 'epoch': 0.8} 80%|███████████████████████████████████████████████████████████████▍ | 409/509 [1:03:19<18:13, 10.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▍ | 409/509 [1:03:19<18:13, 10.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 80%|███████████████████████████████████████████████████████████████▍ | 409/509 [1:03:19<18:13, 10.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▋ | 410/509 [1:03:30<18:07, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▋ | 410/509 [1:03:30<18:07, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▋ | 410/509 [1:03:30<18:07, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▋ | 410/509 [1:03:30<18:07, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▋ | 410/509 [1:03:30<18:07, 10.99s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▊ | 411/509 [1:03:41<17:51, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▊ | 411/509 [1:03:41<17:51, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▊ | 411/509 [1:03:41<17:51, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▊ | 411/509 [1:03:41<17:51, 10.94s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▉ | 412/509 [1:03:52<17:36, 10.90s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▉ | 412/509 [1:03:52<17:36, 10.90s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2065, 'learning_rate': 8.18e-06, 'epoch': 0.81} 81%|███████████████████████████████████████████████████████████████▉ | 412/509 [1:03:52<17:36, 10.90s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▉ | 412/509 [1:03:52<17:36, 10.90s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|███████████████████████████████████████████████████████████████▉ | 412/509 [1:03:52<17:36, 10.90s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████ | 413/509 [1:04:03<17:21, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████ | 413/509 [1:04:03<17:21, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████ | 413/509 [1:04:03<17:21, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████ | 413/509 [1:04:03<17:21, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████ | 413/509 [1:04:03<17:21, 10.85s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████▎ | 414/509 [1:04:13<17:07, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████▎ | 414/509 [1:04:13<17:07, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████▎ | 414/509 [1:04:13<17:07, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████▎ | 414/509 [1:04:13<17:07, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 81%|████████████████████████████████████████████████████████████████▎ | 414/509 [1:04:13<17:07, 10.82s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▍ | 415/509 [1:04:24<16:52, 10.77s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▍ | 415/509 [1:04:24<16:52, 10.77s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▍ | 415/509 [1:04:24<16:52, 10.77s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▍ | 415/509 [1:04:24<16:52, 10.77s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▍ | 415/509 [1:04:24<16:52, 10.77s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▌ | 416/509 [1:04:35<16:37, 10.73s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▌ | 416/509 [1:04:35<16:37, 10.73s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▌ | 416/509 [1:04:35<16:37, 10.73s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▌ | 416/509 [1:04:35<16:37, 10.73s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▌ | 416/509 [1:04:35<16:37, 10.73s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▋ | 417/509 [1:04:45<16:21, 10.67s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▋ | 417/509 [1:04:45<16:21, 10.67s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▋ | 417/509 [1:04:45<16:21, 10.67s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▋ | 417/509 [1:04:45<16:21, 10.67s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▉ | 418/509 [1:04:56<16:02, 10.58s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▉ | 418/509 [1:04:56<16:02, 10.58s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2029, 'learning_rate': 8.3e-06, 'epoch': 0.82} 82%|████████████████████████████████████████████████████████████████▉ | 418/509 [1:04:56<16:02, 10.58s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|████████████████████████████████████████████████████████████████▉ | 418/509 [1:04:56<16:02, 10.58s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|█████████████████████████████████████████████████████████████████ | 419/509 [1:05:06<15:49, 10.55s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|█████████████████████████████████████████████████████████████████ | 419/509 [1:05:06<15:49, 10.55s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1192, 'learning_rate': 8.32e-06, 'epoch': 0.82} 82%|█████████████████████████████████████████████████████████████████ | 419/509 [1:05:06<15:49, 10.55s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|█████████████████████████████████████████████████████████████████ | 419/509 [1:05:06<15:49, 10.55s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 82%|█████████████████████████████████████████████████████████████████ | 419/509 [1:05:06<15:49, 10.55s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▏ | 420/509 [1:05:16<15:34, 10.50s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▏ | 420/509 [1:05:16<15:34, 10.50s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▏ | 420/509 [1:05:16<15:34, 10.50s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▏ | 420/509 [1:05:16<15:34, 10.50s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▏ | 420/509 [1:05:16<15:34, 10.50s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▎ | 421/509 [1:05:27<15:18, 10.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▎ | 421/509 [1:05:27<15:18, 10.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▎ | 421/509 [1:05:27<15:18, 10.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▎ | 421/509 [1:05:27<15:18, 10.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▎ | 421/509 [1:05:27<15:18, 10.44s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▍ | 422/509 [1:05:37<15:03, 10.38s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▍ | 422/509 [1:05:37<15:03, 10.38s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▍ | 422/509 [1:05:37<15:03, 10.38s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▍ | 422/509 [1:05:37<15:03, 10.38s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▋ | 423/509 [1:05:47<14:45, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▋ | 423/509 [1:05:47<14:45, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2221, 'learning_rate': 8.400000000000001e-06, 'epoch': 0.83} 83%|█████████████████████████████████████████████████████████████████▋ | 423/509 [1:05:47<14:45, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▋ | 423/509 [1:05:47<14:45, 10.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0232, 'learning_rate': 8.42e-06, 'epoch': 0.83} 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2479, 'learning_rate': 8.44e-06, 'epoch': 0.83} 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 83%|█████████████████████████████████████████████████████████████████▊ | 424/509 [1:05:57<14:30, 10.24s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████ | 426/509 [1:06:18<14:12, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████ | 426/509 [1:06:18<14:12, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1541, 'learning_rate': 8.46e-06, 'epoch': 0.84} 84%|██████████████████████████████████████████████████████████████████ | 426/509 [1:06:18<14:12, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████ | 426/509 [1:06:18<14:12, 10.27s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▎ | 427/509 [1:06:28<13:50, 10.13s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▎ | 427/509 [1:06:28<13:50, 10.13s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1702, 'learning_rate': 8.48e-06, 'epoch': 0.84} 84%|██████████████████████████████████████████████████████████████████▎ | 427/509 [1:06:28<13:50, 10.13s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▎ | 427/509 [1:06:28<13:50, 10.13s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.18, 'learning_rate': 8.5e-06, 'epoch': 0.84} 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.003, 'learning_rate': 8.52e-06, 'epoch': 0.84} 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▍ | 428/509 [1:06:37<13:27, 9.97s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▋ | 430/509 [1:06:56<12:50, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▋ | 430/509 [1:06:56<12:50, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▋ | 430/509 [1:06:56<12:50, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▋ | 430/509 [1:06:56<12:50, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 84%|██████████████████████████████████████████████████████████████████▋ | 430/509 [1:06:56<12:50, 9.75s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|██████████████████████████████████████████████████████████████████▉ | 431/509 [1:07:06<12:35, 9.68s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|██████████████████████████████████████████████████████████████████▉ | 431/509 [1:07:06<12:35, 9.68s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|██████████████████████████████████████████████████████████████████▉ | 431/509 [1:07:06<12:35, 9.68s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|██████████████████████████████████████████████████████████████████▉ | 431/509 [1:07:06<12:35, 9.68s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████ | 432/509 [1:07:15<12:18, 9.59s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████ | 432/509 [1:07:15<12:18, 9.59s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2373, 'learning_rate': 8.580000000000001e-06, 'epoch': 0.85} 85%|███████████████████████████████████████████████████████████████████ | 432/509 [1:07:15<12:18, 9.59s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████ | 432/509 [1:07:15<12:18, 9.59s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████ | 432/509 [1:07:15<12:18, 9.59s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▏ | 433/509 [1:07:25<12:00, 9.48s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▏ | 433/509 [1:07:25<12:00, 9.48s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▏ | 433/509 [1:07:25<12:00, 9.48s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▏ | 433/509 [1:07:25<12:00, 9.48s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1141, 'learning_rate': 8.62e-06, 'epoch': 0.85} 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1573, 'learning_rate': 8.64e-06, 'epoch': 0.85} 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 85%|███████████████████████████████████████████████████████████████████▎ | 434/509 [1:07:34<11:42, 9.36s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▋ | 436/509 [1:07:51<11:02, 9.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▋ | 436/509 [1:07:51<11:02, 9.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▋ | 436/509 [1:07:51<11:02, 9.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▋ | 436/509 [1:07:51<11:02, 9.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▋ | 436/509 [1:07:51<11:02, 9.08s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▊ | 437/509 [1:08:00<10:42, 8.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▊ | 437/509 [1:08:00<10:42, 8.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▊ | 437/509 [1:08:00<10:42, 8.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▊ | 437/509 [1:08:00<10:42, 8.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▊ | 437/509 [1:08:00<10:42, 8.93s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▉ | 438/509 [1:08:08<10:20, 8.74s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▉ | 438/509 [1:08:08<10:20, 8.74s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▉ | 438/509 [1:08:08<10:20, 8.74s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▉ | 438/509 [1:08:08<10:20, 8.74s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|███████████████████████████████████████████████████████████████████▉ | 438/509 [1:08:08<10:20, 8.74s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▏ | 439/509 [1:08:16<09:57, 8.54s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▏ | 439/509 [1:08:16<09:57, 8.54s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▏ | 439/509 [1:08:16<09:57, 8.54s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▏ | 439/509 [1:08:16<09:57, 8.54s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▏ | 439/509 [1:08:16<09:57, 8.54s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▎ | 440/509 [1:08:24<09:32, 8.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▎ | 440/509 [1:08:24<09:32, 8.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▎ | 440/509 [1:08:24<09:32, 8.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▎ | 440/509 [1:08:24<09:32, 8.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 86%|████████████████████████████████████████████████████████████████████▎ | 440/509 [1:08:24<09:32, 8.30s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 87%|████████████████████████████████████████████████████████████████████▍ | 441/509 [1:08:31<09:06, 8.03s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:33,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:33,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:33,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 87%|████████████████████████████████████████████████████████████████████▌ | 442/509 [1:08:38<08:40, 7.77s/it]g-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 87%|████████████████████████████████████████████████████████████████████▌ | 442/509 [1:08:38<08:40, 7.77s/it]g-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 87%|████████████████████████████████████████████████████████████████████▌ | 442/509 [1:08:38<08:40, 7.77s/it]g-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:44,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:44,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1311, 'learning_rate': 8.8e-06, 'epoch': 0.87} [WARNING|modeling_utils.py:388] 2022-03-01 00:22:44,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:50,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:50,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2058, 'learning_rate': 8.82e-06, 'epoch': 0.87} [WARNING|modeling_utils.py:388] 2022-03-01 00:22:50,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:55,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:22:55,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2934, 'learning_rate': 8.84e-06, 'epoch': 0.87} 88%|█████████████████████████████████████████████████████████████████████▏ | 446/509 [1:09:02<06:21, 6.06s/it]g-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▏ | 446/509 [1:09:02<06:21, 6.06s/it]g-point operations will not be computed-01 00:15:38,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▏ | 446/509 [1:09:02<06:21, 6.06s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:23:01,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:23:03,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:23:01,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:23:03,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:23:01,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▍ | 447/509 [1:09:06<05:41, 5.51s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:23:05,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▌ | 448/509 [1:09:09<05:01, 4.95s/it]g-point operations will not be computed-01 00:23:05,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▌ | 448/509 [1:09:09<05:01, 4.95s/it]g-point operations will not be computed-01 00:23:05,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▌ | 448/509 [1:09:09<05:01, 4.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:23:09,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▌ | 448/509 [1:09:09<05:01, 4.95s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:23:09,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 88%|█████████████████████████████████████████████████████████████████████▋ | 449/509 [1:09:13<04:24, 4.41s/it][WARNING|modeling_utils.py:388] 2022-03-01 00:23:12,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:23:13,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:23:12,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-03-01 00:23:13,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-01 00:23:12,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)6<03:58, 4.04s/it]Traceback (most recent call last):puted-01 00:23:12,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)6<03:58, 4.04s/it]Traceback (most recent call last):puted-01 00:23:12,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)6<03:58, 4.04s/it]Traceback (most recent call last):puted-01 00:23:12,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed