0%| | 0/1019 [00:00> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:32:51,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/1019 [00:07<2:02:31, 7.22s/it] 0%| | 1/1019 [00:07<2:02:31, 7.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:32:54,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:32:57,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 2/1019 [00:13<1:52:49, 6.66s/it] 0%|▏ | 2/1019 [00:13<1:52:49, 6.66s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:00,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:04,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 3/1019 [00:19<1:51:04, 6.56s/it] 0%|▏ | 3/1019 [00:19<1:51:04, 6.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:07,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:10,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 4/1019 [00:26<1:48:41, 6.42s/it] 0%|▎ | 4/1019 [00:26<1:48:41, 6.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:13,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:16,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▍ | 5/1019 [00:32<1:46:18, 6.29s/it] 0%|▍ | 5/1019 [00:32<1:46:18, 6.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:19,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:22,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 6/1019 [00:38<1:45:19, 6.24s/it] 1%|▍ | 6/1019 [00:38<1:45:19, 6.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:25,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:28,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 7/1019 [00:44<1:44:04, 6.17s/it] 1%|▌ | 7/1019 [00:44<1:44:04, 6.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:31,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:34,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 8/1019 [00:50<1:42:44, 6.10s/it] 1%|▋ | 8/1019 [00:50<1:42:44, 6.10s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:37,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:40,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 9/1019 [00:56<1:41:51, 6.05s/it] 1%|▋ | 9/1019 [00:56<1:41:51, 6.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:43,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:33:46,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 10/1019 [01:02<1:40:48, 5.99s/it] 1%|▊ | 10/1019 [01:02<1:40:48, 5.99s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:49,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8686, 'learning_rate': 1.6e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 22:33:52,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 11/1019 [01:08<1:40:32, 5.98s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:33:55,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.889, 'learning_rate': 1.8e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 22:33:58,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▉ | 12/1019 [01:14<1:40:14, 5.97s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:01,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8954, 'learning_rate': 2.0000000000000002e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:04,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|█ | 13/1019 [01:19<1:39:29, 5.93s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:07,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8095, 'learning_rate': 2.2e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:09,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|█ | 14/1019 [01:25<1:38:41, 5.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:12,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7527, 'learning_rate': 2.4000000000000003e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:15,596 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|█▏ | 15/1019 [01:31<1:37:21, 5.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:18,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.989, 'learning_rate': 2.6e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:21,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▏ | 16/1019 [01:36<1:36:07, 5.75s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:34:26,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8145, 'learning_rate': 2.8e-07, 'epoch': 0.02} 2%|█▎ | 17/1019 [01:42<1:35:45, 5.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:29,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.1036, 'learning_rate': 3.0000000000000004e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:32,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▍ | 18/1019 [01:48<1:34:13, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:35,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8358, 'learning_rate': 3.2e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:37,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▍ | 19/1019 [01:53<1:33:24, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:40,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:34:43,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▌ | 20/1019 [01:59<1:32:43, 5.57s/it] 2%|█▌ | 20/1019 [01:59<1:32:43, 5.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:46,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.8954, 'learning_rate': 3.6e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:48,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▋ | 21/1019 [02:04<1:31:47, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:51,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7761, 'learning_rate': 3.8e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 22:34:54,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▋ | 22/1019 [02:09<1:30:51, 5.47s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:34:56,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:34:59,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▊ | 23/1019 [02:15<1:30:14, 5.44s/it] 2%|█▊ | 23/1019 [02:15<1:30:14, 5.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:02,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.1191, 'learning_rate': 4.2000000000000006e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-02-28 22:35:04,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▊ | 24/1019 [02:20<1:29:39, 5.41s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:07,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:09,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▉ | 25/1019 [02:25<1:28:34, 5.35s/it] 2%|█▉ | 25/1019 [02:25<1:28:34, 5.35s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:12,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:15,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██ | 26/1019 [02:30<1:27:47, 5.31s/it] 3%|██ | 26/1019 [02:30<1:27:47, 5.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:17,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:20,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██ | 27/1019 [02:36<1:27:09, 5.27s/it] 3%|██ | 27/1019 [02:36<1:27:09, 5.27s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:22,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:25,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▏ | 28/1019 [02:41<1:26:00, 5.21s/it] 3%|██▏ | 28/1019 [02:41<1:26:00, 5.21s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:28,047 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:30,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▏ | 29/1019 [02:46<1:25:21, 5.17s/it] 3%|██▏ | 29/1019 [02:46<1:25:21, 5.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:33,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:35,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▎ | 30/1019 [02:51<1:24:45, 5.14s/it] 3%|██▎ | 30/1019 [02:51<1:24:45, 5.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:38,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:40,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▍ | 31/1019 [02:56<1:24:09, 5.11s/it] 3%|██▍ | 31/1019 [02:56<1:24:09, 5.11s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:43,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7502, 'learning_rate': 5.800000000000001e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 22:35:45,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▍ | 32/1019 [03:01<1:23:02, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:48,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7812, 'learning_rate': 6.000000000000001e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 22:35:50,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▌ | 33/1019 [03:06<1:22:06, 5.00s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:52,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7388, 'learning_rate': 6.200000000000001e-07, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-02-28 22:35:55,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▋ | 34/1019 [03:10<1:21:02, 4.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:35:57,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:35:59,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▋ | 35/1019 [03:15<1:19:50, 4.87s/it] 3%|██▋ | 35/1019 [03:15<1:19:50, 4.87s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:02,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.9259, 'learning_rate': 6.6e-07, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-02-28 22:36:04,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|██▊ | 36/1019 [03:20<1:18:28, 4.79s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:06,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:09,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|██▊ | 37/1019 [03:24<1:17:15, 4.72s/it] 4%|██▊ | 37/1019 [03:24<1:17:15, 4.72s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:11,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:13,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|██▉ | 38/1019 [03:29<1:15:14, 4.60s/it] 4%|██▉ | 38/1019 [03:29<1:15:14, 4.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:15,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:17,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███ | 39/1019 [03:33<1:13:15, 4.49s/it] 4%|███ | 39/1019 [03:33<1:13:15, 4.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:19,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:21,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███ | 40/1019 [03:37<1:11:09, 4.36s/it] 4%|███ | 40/1019 [03:37<1:11:09, 4.36s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:23,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:25,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▏ | 41/1019 [03:41<1:08:42, 4.22s/it] 4%|███▏ | 41/1019 [03:41<1:08:42, 4.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:27,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:29,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▎ | 42/1019 [03:44<1:05:40, 4.03s/it] 4%|███▎ | 42/1019 [03:44<1:05:40, 4.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:30,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.7547, 'learning_rate': 8.000000000000001e-07, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-02-28 22:36:32,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▎ | 43/1019 [03:48<1:02:03, 3.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:34,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:35,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▍ | 44/1019 [03:51<58:15, 3.59s/it] 4%|███▍ | 44/1019 [03:51<58:15, 3.59s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:36,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:38,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▌ | 45/1019 [03:53<53:39, 3.31s/it] 4%|███▌ | 45/1019 [03:53<53:39, 3.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:39,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.2672, 'learning_rate': 8.6e-07, 'epoch': 0.05} [WARNING|modeling_utils.py:388] 2022-02-28 22:36:40,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▋ | 46/1019 [03:56<48:49, 3.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:41,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.1714, 'learning_rate': 8.8e-07, 'epoch': 0.05} [WARNING|modeling_utils.py:388] 2022-02-28 22:36:42,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▋ | 47/1019 [03:58<44:14, 2.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:43,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:44,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▊ | 48/1019 [04:00<39:56, 2.47s/it] 5%|███▊ | 48/1019 [04:00<39:56, 2.47s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:45,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:36:46,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▉ | 49/1019 [04:01<35:40, 2.21s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:46,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.9028, 'learning_rate': 9.200000000000001e-07, 'epoch': 0.05} [WARNING|modeling_utils.py:388] 2022-02-28 22:36:47,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▉ | 50/1019 [04:03<34:03, 2.11s/it] 5%|███▉ | 50/1019 [04:03<34:03, 2.11s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:51,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▉ | 50/1019 [04:03<34:03, 2.11s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:51,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 51/1019 [04:10<54:56, 3.41s/it]g-point operations will not be computed-28 22:36:51,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 51/1019 [04:10<54:56, 3.41s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:36:57,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 52/1019 [04:16<1:07:57, 4.22s/it]g-point operations will not be computed-28 22:36:57,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 52/1019 [04:16<1:07:57, 4.22s/it]g-point operations will not be computed-28 22:36:57,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 52/1019 [04:16<1:07:57, 4.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:03,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 52/1019 [04:16<1:07:57, 4.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:03,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 53/1019 [04:22<1:16:45, 4.77s/it]g-point operations will not be computed-28 22:37:03,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 53/1019 [04:22<1:16:45, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:09,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████ | 53/1019 [04:22<1:16:45, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:09,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▏ | 54/1019 [04:28<1:22:47, 5.15s/it]g-point operations will not be computed-28 22:37:09,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▏ | 54/1019 [04:28<1:22:47, 5.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:15,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▏ | 54/1019 [04:28<1:22:47, 5.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:15,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 55/1019 [04:34<1:26:09, 5.36s/it]g-point operations will not be computed-28 22:37:15,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 55/1019 [04:34<1:26:09, 5.36s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:21,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 55/1019 [04:34<1:26:09, 5.36s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:21,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 56/1019 [04:40<1:28:43, 5.53s/it]g-point operations will not be computed-28 22:37:21,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 56/1019 [04:40<1:28:43, 5.53s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:27,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 56/1019 [04:40<1:28:43, 5.53s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:27,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 57/1019 [04:45<1:30:30, 5.65s/it]g-point operations will not be computed-28 22:37:27,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 57/1019 [04:45<1:30:30, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 58/1019 [04:51<1:31:24, 5.71s/it]g-point operations will not be computed-28 22:37:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 58/1019 [04:51<1:31:24, 5.71s/it]g-point operations will not be computed-28 22:37:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▍ | 58/1019 [04:51<1:31:24, 5.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:39,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▌ | 59/1019 [04:57<1:31:44, 5.73s/it]g-point operations will not be computed-28 22:37:39,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▌ | 59/1019 [04:57<1:31:44, 5.73s/it]g-point operations will not be computed-28 22:37:39,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▌ | 59/1019 [04:57<1:31:44, 5.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:44,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▌ | 59/1019 [04:57<1:31:44, 5.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:44,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 60/1019 [05:03<1:32:13, 5.77s/it]g-point operations will not be computed-28 22:37:44,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 60/1019 [05:03<1:32:13, 5.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:50,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 60/1019 [05:03<1:32:13, 5.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:50,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 61/1019 [05:09<1:31:51, 5.75s/it]g-point operations will not be computed-28 22:37:50,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▋ | 61/1019 [05:09<1:31:51, 5.75s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:37:56,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▊ | 62/1019 [05:14<1:31:23, 5.73s/it]g-point operations will not be computed-28 22:37:56,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▊ | 62/1019 [05:14<1:31:23, 5.73s/it]g-point operations will not be computed-28 22:37:56,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▊ | 62/1019 [05:14<1:31:23, 5.73s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:02,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▉ | 63/1019 [05:20<1:30:36, 5.69s/it]g-point operations will not be computed-28 22:38:02,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▉ | 63/1019 [05:20<1:30:36, 5.69s/it]g-point operations will not be computed-28 22:38:02,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▉ | 63/1019 [05:20<1:30:36, 5.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:07,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▉ | 64/1019 [05:26<1:30:10, 5.67s/it]g-point operations will not be computed-28 22:38:07,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▉ | 64/1019 [05:26<1:30:10, 5.67s/it]g-point operations will not be computed-28 22:38:07,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▉ | 64/1019 [05:26<1:30:10, 5.67s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:13,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 65/1019 [05:31<1:29:18, 5.62s/it]g-point operations will not be computed-28 22:38:13,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 65/1019 [05:31<1:29:18, 5.62s/it]g-point operations will not be computed-28 22:38:13,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 65/1019 [05:31<1:29:18, 5.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:18,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 66/1019 [05:37<1:28:20, 5.56s/it]g-point operations will not be computed-28 22:38:18,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 66/1019 [05:37<1:28:20, 5.56s/it]g-point operations will not be computed-28 22:38:18,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|█████ | 66/1019 [05:37<1:28:20, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:24,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▏ | 67/1019 [05:42<1:28:13, 5.56s/it]g-point operations will not be computed-28 22:38:24,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▏ | 67/1019 [05:42<1:28:13, 5.56s/it]g-point operations will not be computed-28 22:38:24,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▏ | 67/1019 [05:42<1:28:13, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:29,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 68/1019 [05:48<1:28:09, 5.56s/it]g-point operations will not be computed-28 22:38:29,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 68/1019 [05:48<1:28:09, 5.56s/it]g-point operations will not be computed-28 22:38:29,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 68/1019 [05:48<1:28:09, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:35,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 68/1019 [05:48<1:28:09, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:35,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 69/1019 [05:53<1:27:25, 5.52s/it]g-point operations will not be computed-28 22:38:35,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 69/1019 [05:53<1:27:25, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:40,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▎ | 69/1019 [05:53<1:27:25, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:40,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▍ | 70/1019 [05:58<1:26:30, 5.47s/it]g-point operations will not be computed-28 22:38:40,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▍ | 70/1019 [05:58<1:26:30, 5.47s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:45,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▍ | 70/1019 [05:58<1:26:30, 5.47s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:45,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 71/1019 [06:04<1:25:44, 5.43s/it]g-point operations will not be computed-28 22:38:45,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 71/1019 [06:04<1:25:44, 5.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:51,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 71/1019 [06:04<1:25:44, 5.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:51,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 72/1019 [06:09<1:25:28, 5.42s/it]g-point operations will not be computed-28 22:38:51,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▌ | 72/1019 [06:09<1:25:28, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:38:56,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 73/1019 [06:14<1:24:41, 5.37s/it]g-point operations will not be computed-28 22:38:56,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 73/1019 [06:14<1:24:41, 5.37s/it]g-point operations will not be computed-28 22:38:56,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 73/1019 [06:14<1:24:41, 5.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:01,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 73/1019 [06:14<1:24:41, 5.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:01,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 74/1019 [06:20<1:24:00, 5.33s/it]g-point operations will not be computed-28 22:39:01,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▋ | 74/1019 [06:20<1:24:00, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:07,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▊ | 75/1019 [06:25<1:23:23, 5.30s/it]g-point operations will not be computed-28 22:39:07,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▊ | 75/1019 [06:25<1:23:23, 5.30s/it]g-point operations will not be computed-28 22:39:07,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▊ | 75/1019 [06:25<1:23:23, 5.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:12,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▊ | 75/1019 [06:25<1:23:23, 5.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:12,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▉ | 76/1019 [06:30<1:22:45, 5.27s/it]g-point operations will not be computed-28 22:39:12,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 7%|█████▉ | 76/1019 [06:30<1:22:45, 5.27s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|█████▉ | 77/1019 [06:35<1:21:41, 5.20s/it]g-point operations will not be computed-28 22:39:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|█████▉ | 77/1019 [06:35<1:21:41, 5.20s/it]g-point operations will not be computed-28 22:39:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|█████▉ | 77/1019 [06:35<1:21:41, 5.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:22,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|█████▉ | 77/1019 [06:35<1:21:41, 5.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:22,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████ | 78/1019 [06:40<1:20:57, 5.16s/it]g-point operations will not be computed-28 22:39:22,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████ | 78/1019 [06:40<1:20:57, 5.16s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:27,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████ | 79/1019 [06:45<1:20:42, 5.15s/it]g-point operations will not be computed-28 22:39:27,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████ | 79/1019 [06:45<1:20:42, 5.15s/it]g-point operations will not be computed-28 22:39:27,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████ | 79/1019 [06:45<1:20:42, 5.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:32,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████ | 79/1019 [06:45<1:20:42, 5.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:32,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▏ | 80/1019 [06:50<1:20:01, 5.11s/it]g-point operations will not be computed-28 22:39:32,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▏ | 80/1019 [06:50<1:20:01, 5.11s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:37,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 81/1019 [06:55<1:19:10, 5.06s/it]g-point operations will not be computed-28 22:39:37,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 81/1019 [06:55<1:19:10, 5.06s/it]g-point operations will not be computed-28 22:39:37,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 81/1019 [06:55<1:19:10, 5.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 81/1019 [06:55<1:19:10, 5.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 82/1019 [07:00<1:18:04, 5.00s/it]g-point operations will not be computed-28 22:39:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▎ | 82/1019 [07:00<1:18:04, 5.00s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▍ | 83/1019 [07:05<1:17:21, 4.96s/it]g-point operations will not be computed-28 22:39:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▍ | 83/1019 [07:05<1:17:21, 4.96s/it]g-point operations will not be computed-28 22:39:47,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▍ | 83/1019 [07:05<1:17:21, 4.96s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:52,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 84/1019 [07:10<1:16:32, 4.91s/it]g-point operations will not be computed-28 22:39:52,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 84/1019 [07:10<1:16:32, 4.91s/it]g-point operations will not be computed-28 22:39:52,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 84/1019 [07:10<1:16:32, 4.91s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:39:56,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 85/1019 [07:14<1:14:55, 4.81s/it]g-point operations will not be computed-28 22:39:56,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 85/1019 [07:14<1:14:55, 4.81s/it]g-point operations will not be computed-28 22:39:56,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 85/1019 [07:14<1:14:55, 4.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:01,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▌ | 85/1019 [07:14<1:14:55, 4.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:01,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▋ | 86/1019 [07:19<1:13:56, 4.75s/it]g-point operations will not be computed-28 22:40:01,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▋ | 86/1019 [07:19<1:13:56, 4.75s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:06,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 8%|██████▋ | 86/1019 [07:19<1:13:56, 4.75s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:06,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▋ | 87/1019 [07:23<1:12:35, 4.67s/it]g-point operations will not be computed-28 22:40:06,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▋ | 87/1019 [07:23<1:12:35, 4.67s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:10,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▊ | 88/1019 [07:28<1:10:53, 4.57s/it]g-point operations will not be computed-28 22:40:10,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▊ | 88/1019 [07:28<1:10:53, 4.57s/it]g-point operations will not be computed-28 22:40:10,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▊ | 88/1019 [07:28<1:10:53, 4.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:14,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 89/1019 [07:32<1:08:45, 4.44s/it]g-point operations will not be computed-28 22:40:14,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 89/1019 [07:32<1:08:45, 4.44s/it]g-point operations will not be computed-28 22:40:14,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 89/1019 [07:32<1:08:45, 4.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:18,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 89/1019 [07:32<1:08:45, 4.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:18,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 90/1019 [07:36<1:05:55, 4.26s/it]g-point operations will not be computed-28 22:40:18,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 90/1019 [07:36<1:05:55, 4.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:22,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|██████▉ | 90/1019 [07:36<1:05:55, 4.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:22,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████ | 91/1019 [07:39<1:03:03, 4.08s/it]g-point operations will not be computed-28 22:40:22,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████ | 91/1019 [07:39<1:03:03, 4.08s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:25,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████ | 91/1019 [07:39<1:03:03, 4.08s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:25,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▏ | 92/1019 [07:43<1:00:20, 3.91s/it]g-point operations will not be computed-28 22:40:25,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▏ | 92/1019 [07:43<1:00:20, 3.91s/it]g-point operations will not be computed-28 22:40:25,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:40:30,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:40:29,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:40:30,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:40:29,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▍ | 93/1019 [07:46<57:22, 3.72s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:32,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▍ | 93/1019 [07:46<57:22, 3.72s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:32,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▍ | 94/1019 [07:49<54:01, 3.50s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:35,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▌ | 95/1019 [07:52<50:42, 3.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:35,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▌ | 95/1019 [07:52<50:42, 3.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:35,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▋ | 96/1019 [07:55<47:06, 3.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:38,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 9%|███████▋ | 96/1019 [07:55<47:06, 3.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:38,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▋ | 97/1019 [07:57<43:03, 2.80s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:40,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▋ | 97/1019 [07:57<43:03, 2.80s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:40,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 98/1019 [07:59<39:06, 2.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:42,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 98/1019 [07:59<39:06, 2.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:42,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 99/1019 [08:00<35:16, 2.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:44,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 99/1019 [08:00<35:16, 2.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:44,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 100/1019 [08:02<33:37, 2.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:46,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 100/1019 [08:02<33:37, 2.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:46,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 100/1019 [08:02<33:37, 2.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:50,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 101/1019 [08:09<52:53, 3.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:50,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 101/1019 [08:09<52:53, 3.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:50,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 101/1019 [08:09<52:53, 3.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:56,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 102/1019 [08:15<1:04:39, 4.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:56,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 102/1019 [08:15<1:04:39, 4.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:40:56,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▊ | 102/1019 [08:15<1:04:39, 4.23s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:02,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 103/1019 [08:21<1:12:46, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:02,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 103/1019 [08:21<1:12:46, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:02,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 103/1019 [08:21<1:12:46, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:08,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 104/1019 [08:27<1:17:36, 5.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:08,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 104/1019 [08:27<1:17:36, 5.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:08,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|███████▉ | 104/1019 [08:27<1:17:36, 5.09s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:14,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 105/1019 [08:33<1:21:10, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:14,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 105/1019 [08:33<1:21:10, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:14,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 105/1019 [08:33<1:21:10, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:20,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 105/1019 [08:33<1:21:10, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:20,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 106/1019 [08:38<1:23:21, 5.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:20,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 106/1019 [08:38<1:23:21, 5.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:26,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 106/1019 [08:38<1:23:21, 5.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:26,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 10%|████████ | 106/1019 [08:38<1:23:21, 5.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:26,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▏ | 107/1019 [08:44<1:24:38, 5.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:26,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▏ | 107/1019 [08:44<1:24:38, 5.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:31,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▎ | 108/1019 [08:50<1:25:41, 5.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:31,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▎ | 108/1019 [08:50<1:25:41, 5.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:31,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▎ | 108/1019 [08:50<1:25:41, 5.64s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:37,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▎ | 109/1019 [08:56<1:26:22, 5.70s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:37,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▎ | 109/1019 [08:56<1:26:22, 5.70s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:37,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▎ | 109/1019 [08:56<1:26:22, 5.70s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:43,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 110/1019 [09:02<1:26:41, 5.72s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:43,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 110/1019 [09:02<1:26:41, 5.72s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:43,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 110/1019 [09:02<1:26:41, 5.72s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:49,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 111/1019 [09:07<1:25:53, 5.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:49,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 111/1019 [09:07<1:25:53, 5.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:49,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▍ | 111/1019 [09:07<1:25:53, 5.68s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:54,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▌ | 112/1019 [09:13<1:25:22, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:54,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▌ | 112/1019 [09:13<1:25:22, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:41:54,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▌ | 112/1019 [09:13<1:25:22, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:00,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 113/1019 [09:18<1:25:20, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:00,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 113/1019 [09:18<1:25:20, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:00,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 113/1019 [09:18<1:25:20, 5.65s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:06,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 114/1019 [09:24<1:24:54, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:06,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 114/1019 [09:24<1:24:54, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:06,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 114/1019 [09:24<1:24:54, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:11,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▋ | 114/1019 [09:24<1:24:54, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:11,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▊ | 115/1019 [09:30<1:24:28, 5.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:11,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▊ | 115/1019 [09:30<1:24:28, 5.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 116/1019 [09:35<1:23:36, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 116/1019 [09:35<1:23:36, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:17,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 116/1019 [09:35<1:23:36, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:22,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 117/1019 [09:40<1:23:14, 5.54s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:22,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 117/1019 [09:40<1:23:14, 5.54s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:22,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 11%|████████▉ | 117/1019 [09:40<1:23:14, 5.54s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:27,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████ | 118/1019 [09:46<1:22:27, 5.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:27,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████ | 118/1019 [09:46<1:22:27, 5.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:27,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████ | 118/1019 [09:46<1:22:27, 5.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:33,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████ | 119/1019 [09:51<1:21:21, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:33,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████ | 119/1019 [09:51<1:21:21, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:33,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████ | 119/1019 [09:51<1:21:21, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:38,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▏ | 120/1019 [09:56<1:21:08, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:38,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▏ | 120/1019 [09:56<1:21:08, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:38,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▏ | 120/1019 [09:56<1:21:08, 5.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:43,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 121/1019 [10:02<1:20:21, 5.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:43,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 121/1019 [10:02<1:20:21, 5.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:43,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 121/1019 [10:02<1:20:21, 5.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:49,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 121/1019 [10:02<1:20:21, 5.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:49,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 122/1019 [10:07<1:19:44, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:49,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 122/1019 [10:07<1:19:44, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:54,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▎ | 122/1019 [10:07<1:19:44, 5.33s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:54,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 123/1019 [10:12<1:18:56, 5.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:54,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 123/1019 [10:12<1:18:56, 5.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:59,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 123/1019 [10:12<1:18:56, 5.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:59,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 124/1019 [10:17<1:18:17, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:42:59,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 124/1019 [10:17<1:18:17, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:04,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▍ | 124/1019 [10:17<1:18:17, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:04,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▌ | 125/1019 [10:22<1:17:29, 5.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:04,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▌ | 125/1019 [10:22<1:17:29, 5.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:09,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▌ | 125/1019 [10:22<1:17:29, 5.20s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:09,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 126/1019 [10:28<1:17:10, 5.19s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:09,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 126/1019 [10:28<1:17:10, 5.19s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:14,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 127/1019 [10:33<1:16:14, 5.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:14,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 127/1019 [10:33<1:16:14, 5.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:14,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 127/1019 [10:33<1:16:14, 5.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:19,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 12%|█████████▋ | 127/1019 [10:33<1:16:14, 5.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:19,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▊ | 128/1019 [10:37<1:14:58, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:19,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▊ | 128/1019 [10:37<1:14:58, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▊ | 128/1019 [10:37<1:14:58, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▊ | 129/1019 [10:42<1:14:02, 4.99s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:24,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▊ | 129/1019 [10:42<1:14:02, 4.99s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:29,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▉ | 130/1019 [10:47<1:13:25, 4.96s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:29,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▉ | 130/1019 [10:47<1:13:25, 4.96s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:29,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▉ | 130/1019 [10:47<1:13:25, 4.96s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:34,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|█████████▉ | 130/1019 [10:47<1:13:25, 4.96s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:34,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 131/1019 [10:52<1:12:45, 4.92s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:34,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 131/1019 [10:52<1:12:45, 4.92s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:39,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 131/1019 [10:52<1:12:45, 4.92s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:39,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 132/1019 [10:57<1:11:48, 4.86s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:39,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 132/1019 [10:57<1:11:48, 4.86s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:43,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████ | 132/1019 [10:57<1:11:48, 4.86s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:43,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 133/1019 [11:01<1:11:04, 4.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:43,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 133/1019 [11:01<1:11:04, 4.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:48,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 133/1019 [11:01<1:11:04, 4.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:48,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▏ | 133/1019 [11:01<1:11:04, 4.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:48,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 134/1019 [11:06<1:10:19, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:48,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 134/1019 [11:06<1:10:19, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:53,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 134/1019 [11:06<1:10:19, 4.77s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:53,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 135/1019 [11:11<1:09:06, 4.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:53,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 135/1019 [11:11<1:09:06, 4.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:57,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▎ | 135/1019 [11:11<1:09:06, 4.69s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:57,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▍ | 136/1019 [11:15<1:08:04, 4.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:43:57,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▍ | 136/1019 [11:15<1:08:04, 4.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:02,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▍ | 136/1019 [11:15<1:08:04, 4.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:02,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▍ | 137/1019 [11:19<1:06:50, 4.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:02,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▍ | 137/1019 [11:19<1:06:50, 4.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:06,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 13%|██████████▍ | 137/1019 [11:19<1:06:50, 4.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:06,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▌ | 138/1019 [11:24<1:04:57, 4.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:06,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▌ | 138/1019 [11:24<1:04:57, 4.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:10,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▌ | 138/1019 [11:24<1:04:57, 4.42s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:10,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▋ | 139/1019 [11:28<1:03:04, 4.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:10,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▋ | 139/1019 [11:28<1:03:04, 4.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:14,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▋ | 139/1019 [11:28<1:03:04, 4.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:14,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▋ | 140/1019 [11:31<1:01:16, 4.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:14,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▋ | 140/1019 [11:31<1:01:16, 4.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:18,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|██████████▋ | 140/1019 [11:31<1:01:16, 4.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:18,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████ | 141/1019 [11:35<58:45, 4.02s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:18,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 142/1019 [11:38<55:42, 3.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:21,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 142/1019 [11:38<55:42, 3.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:21,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 142/1019 [11:38<55:42, 3.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:24,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 142/1019 [11:38<55:42, 3.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:24,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▏ | 143/1019 [11:42<52:34, 3.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:24,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 144/1019 [11:44<49:23, 3.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:27,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 144/1019 [11:44<49:23, 3.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:27,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 144/1019 [11:44<49:23, 3.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:30,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▎ | 144/1019 [11:44<49:23, 3.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:30,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 145/1019 [11:47<45:25, 3.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:32,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 145/1019 [11:47<45:25, 3.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:32,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 146/1019 [11:49<41:31, 2.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:35,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▍ | 146/1019 [11:49<41:31, 2.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:35,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▌ | 147/1019 [11:51<37:58, 2.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:36,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 14%|███████████▌ | 147/1019 [11:51<37:58, 2.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:36,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▌ | 148/1019 [11:53<34:28, 2.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:38,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▌ | 148/1019 [11:53<34:28, 2.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:38,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 149/1019 [11:55<31:05, 2.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:40,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 149/1019 [11:55<31:05, 2.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:40,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 150/1019 [11:57<29:58, 2.07s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:40,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 150/1019 [11:57<29:58, 2.07s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:44,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 150/1019 [11:57<29:58, 2.07s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:44,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 151/1019 [12:03<48:41, 3.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:44,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 151/1019 [12:03<48:41, 3.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:50,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 152/1019 [12:09<1:00:03, 4.16s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:50,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 152/1019 [12:09<1:00:03, 4.16s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:50,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 152/1019 [12:09<1:00:03, 4.16s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:56,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 153/1019 [12:15<1:07:56, 4.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:56,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 153/1019 [12:15<1:07:56, 4.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:44:56,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▋ | 153/1019 [12:15<1:07:56, 4.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 154/1019 [12:21<1:12:39, 5.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 154/1019 [12:21<1:12:39, 5.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3009, 'learning_rate': 3e-06, 'epoch': 0.15} 15%|███████████▊ | 154/1019 [12:21<1:12:39, 5.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 155/1019 [12:26<1:15:37, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 155/1019 [12:26<1:15:37, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▊ | 155/1019 [12:26<1:15:37, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 156/1019 [12:32<1:17:22, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 156/1019 [12:32<1:17:22, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 156/1019 [12:32<1:17:22, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 15%|███████████▉ | 156/1019 [12:32<1:17:22, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4029, 'learning_rate': 3.0600000000000003e-06, 'epoch': 0.15} 15%|███████████▉ | 156/1019 [12:32<1:17:22, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████ | 158/1019 [12:44<1:19:43, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████ | 158/1019 [12:44<1:19:43, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.5353, 'learning_rate': 3.08e-06, 'epoch': 0.15} 16%|████████████▏ | 159/1019 [12:49<1:20:19, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▏ | 159/1019 [12:49<1:20:19, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3978, 'learning_rate': 3.1000000000000004e-06, 'epoch': 0.16} 16%|████████████▏ | 159/1019 [12:49<1:20:19, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▏ | 160/1019 [12:55<1:20:24, 5.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▏ | 160/1019 [12:55<1:20:24, 5.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▎ | 161/1019 [13:01<1:20:03, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▎ | 161/1019 [13:01<1:20:03, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3162, 'learning_rate': 3.1400000000000004e-06, 'epoch': 0.16} 16%|████████████▍ | 162/1019 [13:06<1:19:31, 5.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 162/1019 [13:06<1:19:31, 5.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4252, 'learning_rate': 3.1600000000000002e-06, 'epoch': 0.16} 16%|████████████▍ | 162/1019 [13:06<1:19:31, 5.57s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 163/1019 [13:12<1:19:35, 5.58s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▍ | 163/1019 [13:12<1:19:35, 5.58s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 164/1019 [13:17<1:19:31, 5.58s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▌ | 164/1019 [13:17<1:19:31, 5.58s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3058, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.16} 16%|████████████▋ | 165/1019 [13:23<1:19:05, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 165/1019 [13:23<1:19:05, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3383, 'learning_rate': 3.2200000000000005e-06, 'epoch': 0.16} 16%|████████████▋ | 166/1019 [13:28<1:18:39, 5.53s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▋ | 166/1019 [13:28<1:18:39, 5.53s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2194, 'learning_rate': 3.2400000000000003e-06, 'epoch': 0.16} 16%|████████████▊ | 167/1019 [13:34<1:18:21, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▊ | 167/1019 [13:34<1:18:21, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2951, 'learning_rate': 3.2600000000000006e-06, 'epoch': 0.16} 16%|████████████▊ | 168/1019 [13:39<1:17:41, 5.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 16%|████████████▊ | 168/1019 [13:39<1:17:41, 5.48s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1291, 'learning_rate': 3.2800000000000004e-06, 'epoch': 0.16} 17%|████████████▉ | 169/1019 [13:44<1:17:04, 5.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|████████████▉ | 169/1019 [13:44<1:17:04, 5.44s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1544, 'learning_rate': 3.3000000000000006e-06, 'epoch': 0.17} 17%|█████████████ | 170/1019 [13:50<1:16:09, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████ | 170/1019 [13:50<1:16:09, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1989, 'learning_rate': 3.3200000000000004e-06, 'epoch': 0.17} 17%|█████████████ | 171/1019 [13:55<1:14:59, 5.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████ | 171/1019 [13:55<1:14:59, 5.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4741, 'learning_rate': 3.3400000000000006e-06, 'epoch': 0.17} 17%|█████████████▏ | 172/1019 [14:00<1:14:27, 5.27s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▏ | 172/1019 [14:00<1:14:27, 5.27s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3015, 'learning_rate': 3.3600000000000004e-06, 'epoch': 0.17} 17%|█████████████▏ | 173/1019 [14:05<1:13:51, 5.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▏ | 173/1019 [14:05<1:13:51, 5.24s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2859, 'learning_rate': 3.3800000000000007e-06, 'epoch': 0.17} 17%|█████████████▎ | 174/1019 [14:10<1:12:34, 5.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▎ | 174/1019 [14:10<1:12:34, 5.15s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3791, 'learning_rate': 3.4000000000000005e-06, 'epoch': 0.17} 17%|█████████████▍ | 175/1019 [14:15<1:11:51, 5.11s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▍ | 175/1019 [14:15<1:11:51, 5.11s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3456, 'learning_rate': 3.4200000000000007e-06, 'epoch': 0.17} 17%|█████████████▍ | 176/1019 [14:20<1:11:00, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▍ | 176/1019 [14:20<1:11:00, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.203, 'learning_rate': 3.44e-06, 'epoch': 0.17} 17%|█████████████▍ | 176/1019 [14:20<1:11:00, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▌ | 177/1019 [14:25<1:10:42, 5.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▌ | 177/1019 [14:25<1:10:42, 5.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▋ | 178/1019 [14:30<1:09:45, 4.98s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 17%|█████████████▋ | 178/1019 [14:30<1:09:45, 4.98s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.359, 'learning_rate': 3.48e-06, 'epoch': 0.17} 18%|█████████████▋ | 179/1019 [14:35<1:08:32, 4.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|█████████████▋ | 179/1019 [14:35<1:08:32, 4.90s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2511, 'learning_rate': 3.5e-06, 'epoch': 0.18} 18%|█████████████▊ | 180/1019 [14:39<1:07:47, 4.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|█████████████▊ | 180/1019 [14:39<1:07:47, 4.85s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4222, 'learning_rate': 3.52e-06, 'epoch': 0.18} 18%|█████████████▊ | 181/1019 [14:44<1:07:17, 4.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|█████████████▊ | 181/1019 [14:44<1:07:17, 4.82s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.47, 'learning_rate': 3.54e-06, 'epoch': 0.18} 18%|█████████████▉ | 182/1019 [14:49<1:07:00, 4.80s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|█████████████▉ | 182/1019 [14:49<1:07:00, 4.80s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4664, 'learning_rate': 3.5600000000000002e-06, 'epoch': 0.18} 18%|█████████████▉ | 182/1019 [14:49<1:07:00, 4.80s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████ | 183/1019 [14:53<1:06:04, 4.74s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████ | 183/1019 [14:53<1:06:04, 4.74s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.315, 'learning_rate': 3.6000000000000003e-06, 'epoch': 0.18} 18%|██████████████▏ | 185/1019 [15:02<1:04:06, 4.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 185/1019 [15:02<1:04:06, 4.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2008, 'learning_rate': 3.62e-06, 'epoch': 0.18} 18%|██████████████▏ | 185/1019 [15:02<1:04:06, 4.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 185/1019 [15:02<1:04:06, 4.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4038, 'learning_rate': 3.6400000000000003e-06, 'epoch': 0.18} 18%|██████████████▏ | 185/1019 [15:02<1:04:06, 4.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▏ | 185/1019 [15:02<1:04:06, 4.61s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▎ | 187/1019 [15:11<1:01:46, 4.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▎ | 187/1019 [15:11<1:01:46, 4.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▎ | 187/1019 [15:11<1:01:46, 4.46s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▍ | 188/1019 [15:15<1:00:49, 4.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▍ | 188/1019 [15:15<1:00:49, 4.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 18%|██████████████▍ | 188/1019 [15:15<1:00:49, 4.39s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|██████████████▊ | 189/1019 [15:19<59:19, 4.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|██████████████▊ | 189/1019 [15:19<59:19, 4.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|██████████████▊ | 189/1019 [15:19<59:19, 4.29s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|██████████████▉ | 190/1019 [15:23<57:36, 4.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:11,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:11,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4258, 'learning_rate': 3.74e-06, 'epoch': 0.19} 19%|███████████████ | 192/1019 [15:30<53:09, 3.86s/it]g-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████ | 192/1019 [15:30<53:09, 3.86s/it]g-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3538, 'learning_rate': 3.7600000000000004e-06, 'epoch': 0.19} 19%|███████████████ | 192/1019 [15:30<53:09, 3.86s/it]g-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▏ | 193/1019 [15:34<50:30, 3.67s/it]g-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:21,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:21,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4872, 'learning_rate': 3.8000000000000005e-06, 'epoch': 0.19} [WARNING|modeling_utils.py:388] 2022-02-28 22:48:21,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:45:02,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▎ | 195/1019 [15:39<44:46, 3.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:48:25,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▎ | 195/1019 [15:39<44:46, 3.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:48:25,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▍ | 196/1019 [15:42<41:42, 3.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▍ | 197/1019 [15:44<38:48, 2.83s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 19%|███████████████▍ | 197/1019 [15:44<38:48, 2.83s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:31,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:31,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.6079, 'learning_rate': 3.88e-06, 'epoch': 0.19} [WARNING|modeling_utils.py:388] 2022-02-28 22:48:32,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:32,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:34,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:34,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:48:34,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 201/1019 [15:56<46:15, 3.39s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 201/1019 [15:56<46:15, 3.39s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 201/1019 [15:56<46:15, 3.39s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 202/1019 [16:02<56:49, 4.17s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 202/1019 [16:02<56:49, 4.17s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 202/1019 [16:02<56:49, 4.17s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▌ | 203/1019 [16:08<1:03:43, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▌ | 203/1019 [16:08<1:03:43, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▌ | 203/1019 [16:08<1:03:43, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▌ | 203/1019 [16:08<1:03:43, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1678, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.2} 20%|███████████████▌ | 203/1019 [16:08<1:03:43, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▋ | 205/1019 [16:20<1:11:17, 5.25s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▋ | 205/1019 [16:20<1:11:17, 5.25s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.09, 'learning_rate': 4.0200000000000005e-06, 'epoch': 0.2} 20%|███████████████▊ | 206/1019 [16:25<1:13:30, 5.42s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 206/1019 [16:25<1:13:30, 5.42s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1526, 'learning_rate': 4.04e-06, 'epoch': 0.2} 20%|███████████████▊ | 206/1019 [16:25<1:13:30, 5.42s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 207/1019 [16:31<1:14:27, 5.50s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 207/1019 [16:31<1:14:27, 5.50s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▊ | 207/1019 [16:31<1:14:27, 5.50s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 208/1019 [16:37<1:15:08, 5.56s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 208/1019 [16:37<1:15:08, 5.56s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 208/1019 [16:37<1:15:08, 5.56s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 20%|███████████████▉ | 208/1019 [16:37<1:15:08, 5.56s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4409, 'learning_rate': 4.1e-06, 'epoch': 0.21} 20%|███████████████▉ | 208/1019 [16:37<1:15:08, 5.56s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████ | 210/1019 [16:48<1:15:21, 5.59s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████ | 210/1019 [16:48<1:15:21, 5.59s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2961, 'learning_rate': 4.12e-06, 'epoch': 0.21} 21%|████████████████ | 210/1019 [16:48<1:15:21, 5.59s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▏ | 211/1019 [16:54<1:15:42, 5.62s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▏ | 211/1019 [16:54<1:15:42, 5.62s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▏ | 211/1019 [16:54<1:15:42, 5.62s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▏ | 212/1019 [16:59<1:15:33, 5.62s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▏ | 212/1019 [16:59<1:15:33, 5.62s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▎ | 213/1019 [17:05<1:14:57, 5.58s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▎ | 213/1019 [17:05<1:14:57, 5.58s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1999, 'learning_rate': 4.18e-06, 'epoch': 0.21} 21%|████████████████▍ | 214/1019 [17:10<1:14:26, 5.55s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▍ | 214/1019 [17:10<1:14:26, 5.55s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3546, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.21} 21%|████████████████▍ | 214/1019 [17:10<1:14:26, 5.55s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▍ | 215/1019 [17:16<1:13:51, 5.51s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▍ | 215/1019 [17:16<1:13:51, 5.51s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 216/1019 [17:21<1:13:07, 5.46s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 216/1019 [17:21<1:13:07, 5.46s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3463, 'learning_rate': 4.24e-06, 'epoch': 0.21} 21%|████████████████▌ | 217/1019 [17:26<1:12:37, 5.43s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▌ | 217/1019 [17:26<1:12:37, 5.43s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3692, 'learning_rate': 4.26e-06, 'epoch': 0.21} 21%|████████████████▌ | 217/1019 [17:26<1:12:37, 5.43s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▋ | 218/1019 [17:32<1:12:25, 5.43s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▋ | 218/1019 [17:32<1:12:25, 5.43s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 219/1019 [17:37<1:12:02, 5.40s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 21%|████████████████▊ | 219/1019 [17:37<1:12:02, 5.40s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2891, 'learning_rate': 4.3e-06, 'epoch': 0.21} 21%|████████████████▊ | 219/1019 [17:37<1:12:02, 5.40s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▊ | 220/1019 [17:42<1:11:20, 5.36s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▊ | 220/1019 [17:42<1:11:20, 5.36s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▊ | 220/1019 [17:42<1:11:20, 5.36s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▊ | 220/1019 [17:42<1:11:20, 5.36s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1206, 'learning_rate': 4.34e-06, 'epoch': 0.22} 22%|████████████████▊ | 220/1019 [17:42<1:11:20, 5.36s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▊ | 220/1019 [17:42<1:11:20, 5.36s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▉ | 222/1019 [17:53<1:09:10, 5.21s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|████████████████▉ | 222/1019 [17:53<1:09:10, 5.21s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 223/1019 [17:58<1:08:24, 5.16s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████ | 223/1019 [17:58<1:08:24, 5.16s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.178, 'learning_rate': 4.38e-06, 'epoch': 0.22} 22%|█████████████████ | 223/1019 [17:58<1:08:24, 5.16s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 224/1019 [18:03<1:07:36, 5.10s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 224/1019 [18:03<1:07:36, 5.10s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 225/1019 [18:08<1:07:14, 5.08s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▏ | 225/1019 [18:08<1:07:14, 5.08s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2947, 'learning_rate': 4.42e-06, 'epoch': 0.22} 22%|█████████████████▏ | 225/1019 [18:08<1:07:14, 5.08s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▎ | 226/1019 [18:13<1:06:44, 5.05s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▎ | 226/1019 [18:13<1:06:44, 5.05s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 227/1019 [18:18<1:06:37, 5.05s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 227/1019 [18:18<1:06:37, 5.05s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2056, 'learning_rate': 4.4600000000000005e-06, 'epoch': 0.22} 22%|█████████████████▍ | 228/1019 [18:23<1:05:47, 4.99s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▍ | 228/1019 [18:23<1:05:47, 4.99s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.127, 'learning_rate': 4.48e-06, 'epoch': 0.22} 22%|█████████████████▍ | 228/1019 [18:23<1:05:47, 4.99s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 229/1019 [18:27<1:04:56, 4.93s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 22%|█████████████████▌ | 229/1019 [18:27<1:04:56, 4.93s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|█████████████████▌ | 230/1019 [18:32<1:04:23, 4.90s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|█████████████████▌ | 230/1019 [18:32<1:04:23, 4.90s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2538, 'learning_rate': 4.520000000000001e-06, 'epoch': 0.23} 23%|█████████████████▋ | 231/1019 [18:37<1:03:28, 4.83s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|█████████████████▋ | 231/1019 [18:37<1:03:28, 4.83s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.132, 'learning_rate': 4.540000000000001e-06, 'epoch': 0.23} 23%|█████████████████▊ | 232/1019 [18:42<1:03:13, 4.82s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|█████████████████▊ | 232/1019 [18:42<1:03:13, 4.82s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2221, 'learning_rate': 4.56e-06, 'epoch': 0.23} 23%|█████████████████▊ | 233/1019 [18:46<1:02:19, 4.76s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|█████████████████▊ | 233/1019 [18:46<1:02:19, 4.76s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3872, 'learning_rate': 4.58e-06, 'epoch': 0.23} 23%|█████████████████▉ | 234/1019 [18:51<1:01:20, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|█████████████████▉ | 234/1019 [18:51<1:01:20, 4.69s/it]g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1556, 'learning_rate': 4.600000000000001e-06, 'epoch': 0.23} g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4018, 'learning_rate': 4.620000000000001e-06, 'epoch': 0.23} g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2207, 'learning_rate': 4.6400000000000005e-06, 'epoch': 0.23} [WARNING|modeling_utils.py:388] 2022-02-28 22:51:48,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:51:48,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2231, 'learning_rate': 4.66e-06, 'epoch': 0.23} [WARNING|modeling_utils.py:388] 2022-02-28 22:51:48,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:48:27,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▋ | 238/1019 [19:08<57:23, 4.41s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▊ | 239/1019 [19:12<56:02, 4.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 23%|██████████████████▊ | 239/1019 [19:12<56:02, 4.31s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3018, 'learning_rate': 4.7e-06, 'epoch': 0.23} 24%|██████████████████▊ | 240/1019 [19:16<54:11, 4.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▊ | 240/1019 [19:16<54:11, 4.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.5058, 'learning_rate': 4.7200000000000005e-06, 'epoch': 0.24} 24%|██████████████████▊ | 240/1019 [19:16<54:11, 4.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 241/1019 [19:20<52:21, 4.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 241/1019 [19:20<52:21, 4.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 241/1019 [19:20<52:21, 4.04s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 242/1019 [19:23<49:46, 3.84s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|██████████████████▉ | 242/1019 [19:23<49:46, 3.84s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:52:11,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:52:11,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:52:11,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:51:55,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 244/1019 [19:29<43:29, 3.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:15,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 244/1019 [19:29<43:29, 3.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:15,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 245/1019 [19:32<40:18, 3.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:17,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▏ | 245/1019 [19:32<40:18, 3.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:17,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▎ | 246/1019 [19:34<37:08, 2.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:19,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▎ | 246/1019 [19:34<37:08, 2.88s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:19,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▍ | 247/1019 [19:36<33:45, 2.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:21,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▍ | 247/1019 [19:36<33:45, 2.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:21,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▍ | 248/1019 [19:38<30:18, 2.36s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:23,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▍ | 248/1019 [19:38<30:18, 2.36s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:23,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▌ | 249/1019 [19:39<27:21, 2.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 24%|███████████████████▌ | 249/1019 [19:39<27:21, 2.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 250/1019 [19:41<26:26, 2.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 250/1019 [19:41<26:26, 2.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 250/1019 [19:41<26:26, 2.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 250/1019 [19:41<26:26, 2.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0626, 'learning_rate': 4.94e-06, 'epoch': 0.25} 25%|███████████████████▋ | 250/1019 [19:41<26:26, 2.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 250/1019 [19:41<26:26, 2.06s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 252/1019 [19:54<52:54, 4.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 252/1019 [19:54<52:54, 4.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 252/1019 [19:54<52:54, 4.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 252/1019 [19:54<52:54, 4.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1374, 'learning_rate': 4.980000000000001e-06, 'epoch': 0.25} 25%|███████████████████▊ | 252/1019 [19:54<52:54, 4.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 254/1019 [20:05<1:04:09, 5.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▍ | 254/1019 [20:05<1:04:09, 5.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 3.9305, 'learning_rate': 5e-06, 'epoch': 0.25} 25%|███████████████████▍ | 254/1019 [20:05<1:04:09, 5.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 255/1019 [20:11<1:07:00, 5.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 255/1019 [20:11<1:07:00, 5.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 255/1019 [20:11<1:07:00, 5.26s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 256/1019 [20:17<1:09:05, 5.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 256/1019 [20:17<1:09:05, 5.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▌ | 256/1019 [20:17<1:09:05, 5.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 257/1019 [20:23<1:09:56, 5.51s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 257/1019 [20:23<1:09:56, 5.51s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 258/1019 [20:28<1:10:29, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▋ | 258/1019 [20:28<1:10:29, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 3.9802, 'learning_rate': 5.0800000000000005e-06, 'epoch': 0.25} 25%|███████████████████▊ | 259/1019 [20:34<1:10:58, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 25%|███████████████████▊ | 259/1019 [20:34<1:10:58, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0783, 'learning_rate': 5.1e-06, 'epoch': 0.25} 26%|███████████████████▉ | 260/1019 [20:40<1:11:11, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|███████████████████▉ | 260/1019 [20:40<1:11:11, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1619, 'learning_rate': 5.12e-06, 'epoch': 0.26} 26%|███████████████████▉ | 260/1019 [20:40<1:11:11, 5.63s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|███████████████████▉ | 261/1019 [20:45<1:10:48, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|███████████████████▉ | 261/1019 [20:45<1:10:48, 5.60s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████ | 262/1019 [20:51<1:10:09, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████ | 262/1019 [20:51<1:10:09, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1806, 'learning_rate': 5.1600000000000006e-06, 'epoch': 0.26} 26%|████████████████████▏ | 263/1019 [20:56<1:09:58, 5.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▏ | 263/1019 [20:56<1:09:58, 5.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2457, 'learning_rate': 5.18e-06, 'epoch': 0.26} 26%|████████████████████▏ | 263/1019 [20:56<1:09:58, 5.55s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▏ | 264/1019 [21:02<1:10:00, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▏ | 264/1019 [21:02<1:10:00, 5.56s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 265/1019 [21:07<1:09:25, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 265/1019 [21:07<1:09:25, 5.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1241, 'learning_rate': 5.220000000000001e-06, 'epoch': 0.26} 26%|████████████████████▎ | 266/1019 [21:13<1:08:57, 5.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▎ | 266/1019 [21:13<1:08:57, 5.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3891, 'learning_rate': 5.240000000000001e-06, 'epoch': 0.26} 26%|████████████████████▎ | 266/1019 [21:13<1:08:57, 5.49s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 267/1019 [21:18<1:08:30, 5.47s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▍ | 267/1019 [21:18<1:08:30, 5.47s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▌ | 268/1019 [21:23<1:07:38, 5.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▌ | 268/1019 [21:23<1:07:38, 5.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3778, 'learning_rate': 5.28e-06, 'epoch': 0.26} 26%|████████████████████▌ | 269/1019 [21:29<1:07:35, 5.41s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▌ | 269/1019 [21:29<1:07:35, 5.41s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1025, 'learning_rate': 5.300000000000001e-06, 'epoch': 0.26} 26%|████████████████████▋ | 270/1019 [21:34<1:07:06, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 26%|████████████████████▋ | 270/1019 [21:34<1:07:06, 5.38s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2955, 'learning_rate': 5.320000000000001e-06, 'epoch': 0.26} 27%|████████████████████▋ | 271/1019 [21:39<1:06:19, 5.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|████████████████████▋ | 271/1019 [21:39<1:06:19, 5.32s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2255, 'learning_rate': 5.3400000000000005e-06, 'epoch': 0.27} 27%|████████████████████▊ | 272/1019 [21:44<1:05:22, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|████████████████████▊ | 272/1019 [21:44<1:05:22, 5.25s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1709, 'learning_rate': 5.36e-06, 'epoch': 0.27} 27%|████████████████████▉ | 273/1019 [21:50<1:04:55, 5.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|████████████████████▉ | 273/1019 [21:50<1:04:55, 5.22s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2409, 'learning_rate': 5.380000000000001e-06, 'epoch': 0.27} 27%|████████████████████▉ | 274/1019 [21:55<1:04:38, 5.21s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|████████████████████▉ | 274/1019 [21:55<1:04:38, 5.21s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3363, 'learning_rate': 5.400000000000001e-06, 'epoch': 0.27} 27%|█████████████████████ | 275/1019 [22:00<1:03:42, 5.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████ | 275/1019 [22:00<1:03:42, 5.14s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1415, 'learning_rate': 5.420000000000001e-06, 'epoch': 0.27} 27%|█████████████████████▏ | 276/1019 [22:05<1:03:27, 5.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▏ | 276/1019 [22:05<1:03:27, 5.12s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.6055, 'learning_rate': 5.4400000000000004e-06, 'epoch': 0.27} 27%|█████████████████████▏ | 277/1019 [22:10<1:02:45, 5.07s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▏ | 277/1019 [22:10<1:02:45, 5.07s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2672, 'learning_rate': 5.460000000000001e-06, 'epoch': 0.27} 27%|█████████████████████▎ | 278/1019 [22:15<1:02:19, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▎ | 278/1019 [22:15<1:02:19, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1264, 'learning_rate': 5.480000000000001e-06, 'epoch': 0.27} 27%|█████████████████████▎ | 278/1019 [22:15<1:02:19, 5.05s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▎ | 279/1019 [22:20<1:02:04, 5.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▎ | 279/1019 [22:20<1:02:04, 5.03s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▍ | 280/1019 [22:25<1:01:40, 5.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 27%|█████████████████████▍ | 280/1019 [22:25<1:01:40, 5.01s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1477, 'learning_rate': 5.5200000000000005e-06, 'epoch': 0.27} 28%|█████████████████████▌ | 281/1019 [22:29<1:00:47, 4.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|█████████████████████▌ | 281/1019 [22:29<1:00:47, 4.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.264, 'learning_rate': 5.540000000000001e-06, 'epoch': 0.28} 28%|█████████████████████▌ | 281/1019 [22:29<1:00:47, 4.94s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|█████████████████████▌ | 282/1019 [22:34<1:00:03, 4.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|█████████████████████▌ | 282/1019 [22:34<1:00:03, 4.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|█████████████████████▌ | 282/1019 [22:34<1:00:03, 4.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▏ | 283/1019 [22:39<59:11, 4.83s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▏ | 283/1019 [22:39<59:11, 4.83s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▏ | 283/1019 [22:39<59:11, 4.83s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 284/1019 [22:43<58:15, 4.76s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 284/1019 [22:43<58:15, 4.76s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 285/1019 [22:48<57:29, 4.70s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▎ | 285/1019 [22:48<57:29, 4.70s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:55:37,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:55:37,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1766, 'learning_rate': 5.64e-06, 'epoch': 0.28} 28%|██████████████████████▌ | 287/1019 [22:57<55:26, 4.54s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▌ | 287/1019 [22:57<55:26, 4.54s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3387, 'learning_rate': 5.66e-06, 'epoch': 0.28} 28%|██████████████████████▌ | 287/1019 [22:57<55:26, 4.54s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▌ | 288/1019 [23:01<54:10, 4.45s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:55:49,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:55:49,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3374, 'learning_rate': 5.7e-06, 'epoch': 0.28} 28%|██████████████████████▊ | 290/1019 [23:09<50:54, 4.19s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 28%|██████████████████████▊ | 290/1019 [23:09<50:54, 4.19s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3139, 'learning_rate': 5.72e-06, 'epoch': 0.28} 29%|██████████████████████▊ | 291/1019 [23:13<48:44, 4.02s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 29%|██████████████████████▊ | 291/1019 [23:13<48:44, 4.02s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3346, 'learning_rate': 5.74e-06, 'epoch': 0.29} 29%|██████████████████████▊ | 291/1019 [23:13<48:44, 4.02s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 29%|██████████████████████▉ | 292/1019 [23:16<46:22, 3.83s/it]g-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:03,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:03,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3369, 'learning_rate': 5.78e-06, 'epoch': 0.29} [WARNING|modeling_utils.py:388] 2022-02-28 22:56:03,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:52:24,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 29%|███████████████████████ | 294/1019 [23:22<41:06, 3.40s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 29%|███████████████████████▏ | 295/1019 [23:25<38:24, 3.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 29%|███████████████████████▏ | 295/1019 [23:25<38:24, 3.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:11,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:11,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:13,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:13,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:15,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:15,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:17,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:17,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:19,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:56:19,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4075, 'learning_rate': 5.92e-06, 'epoch': 0.29} 30%|███████████████████████▋ | 301/1019 [23:41<39:34, 3.31s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 301/1019 [23:41<39:34, 3.31s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3791, 'learning_rate': 5.94e-06, 'epoch': 0.3} 30%|███████████████████████▋ | 301/1019 [23:41<39:34, 3.31s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 302/1019 [23:47<48:56, 4.10s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 302/1019 [23:47<48:56, 4.10s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 302/1019 [23:47<48:56, 4.10s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▊ | 303/1019 [23:52<55:14, 4.63s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▊ | 303/1019 [23:52<55:14, 4.63s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▊ | 304/1019 [23:58<59:26, 4.99s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▊ | 304/1019 [23:58<59:26, 4.99s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3519, 'learning_rate': 6e-06, 'epoch': 0.3} 30%|███████████████████████▎ | 305/1019 [24:04<1:02:09, 5.22s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▎ | 305/1019 [24:04<1:02:09, 5.22s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0116, 'learning_rate': 6.02e-06, 'epoch': 0.3} 30%|███████████████████████▍ | 306/1019 [24:10<1:04:04, 5.39s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▍ | 306/1019 [24:10<1:04:04, 5.39s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3547, 'learning_rate': 6.040000000000001e-06, 'epoch': 0.3} 30%|███████████████████████▍ | 307/1019 [24:16<1:05:16, 5.50s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▍ | 307/1019 [24:16<1:05:16, 5.50s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2222, 'learning_rate': 6.0600000000000004e-06, 'epoch': 0.3} 30%|███████████████████████▍ | 307/1019 [24:16<1:05:16, 5.50s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▌ | 308/1019 [24:21<1:05:59, 5.57s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▌ | 308/1019 [24:21<1:05:59, 5.57s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 309/1019 [24:27<1:06:30, 5.62s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 309/1019 [24:27<1:06:30, 5.62s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2797, 'learning_rate': 6.1e-06, 'epoch': 0.3} 30%|███████████████████████▋ | 310/1019 [24:33<1:06:39, 5.64s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 30%|███████████████████████▋ | 310/1019 [24:33<1:06:39, 5.64s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4489, 'learning_rate': 6.120000000000001e-06, 'epoch': 0.3} 31%|███████████████████████▊ | 311/1019 [24:38<1:06:23, 5.63s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|███████████████████████▊ | 311/1019 [24:38<1:06:23, 5.63s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1933, 'learning_rate': 6.1400000000000005e-06, 'epoch': 0.31} 31%|███████████████████████▉ | 312/1019 [24:44<1:06:00, 5.60s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|███████████████████████▉ | 312/1019 [24:44<1:06:00, 5.60s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1103, 'learning_rate': 6.16e-06, 'epoch': 0.31} 31%|███████████████████████▉ | 312/1019 [24:44<1:06:00, 5.60s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|███████████████████████▉ | 312/1019 [24:44<1:06:00, 5.60s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.24, 'learning_rate': 6.18e-06, 'epoch': 0.31} 31%|███████████████████████▉ | 312/1019 [24:44<1:06:00, 5.60s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████ | 314/1019 [24:55<1:04:36, 5.50s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████ | 314/1019 [24:55<1:04:36, 5.50s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2906, 'learning_rate': 6.200000000000001e-06, 'epoch': 0.31} 31%|████████████████████████ | 314/1019 [24:55<1:04:36, 5.50s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████ | 315/1019 [25:00<1:04:18, 5.48s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████ | 315/1019 [25:00<1:04:18, 5.48s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▏ | 316/1019 [25:06<1:03:57, 5.46s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▏ | 316/1019 [25:06<1:03:57, 5.46s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2807, 'learning_rate': 6.24e-06, 'epoch': 0.31} 31%|████████████████████████▎ | 317/1019 [25:11<1:04:03, 5.48s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▎ | 317/1019 [25:11<1:04:03, 5.48s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2228, 'learning_rate': 6.26e-06, 'epoch': 0.31} 31%|████████████████████████▎ | 317/1019 [25:11<1:04:03, 5.48s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▎ | 318/1019 [25:16<1:03:37, 5.45s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▎ | 318/1019 [25:16<1:03:37, 5.45s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▍ | 319/1019 [25:22<1:02:45, 5.38s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▍ | 319/1019 [25:22<1:02:45, 5.38s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2356, 'learning_rate': 6.300000000000001e-06, 'epoch': 0.31} 31%|████████████████████████▍ | 319/1019 [25:22<1:02:45, 5.38s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▍ | 320/1019 [25:27<1:02:23, 5.36s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 31%|████████████████████████▍ | 320/1019 [25:27<1:02:23, 5.36s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▌ | 321/1019 [25:32<1:01:50, 5.32s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▌ | 321/1019 [25:32<1:01:50, 5.32s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1898, 'learning_rate': 6.34e-06, 'epoch': 0.31} 32%|████████████████████████▋ | 322/1019 [25:37<1:01:17, 5.28s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▋ | 322/1019 [25:37<1:01:17, 5.28s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1278, 'learning_rate': 6.360000000000001e-06, 'epoch': 0.32} 32%|████████████████████████▋ | 323/1019 [25:43<1:00:44, 5.24s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▋ | 323/1019 [25:43<1:00:44, 5.24s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.088, 'learning_rate': 6.380000000000001e-06, 'epoch': 0.32} 32%|████████████████████████▋ | 323/1019 [25:43<1:00:44, 5.24s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▊ | 324/1019 [25:48<1:00:04, 5.19s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|████████████████████████▊ | 324/1019 [25:48<1:00:04, 5.19s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▌ | 325/1019 [25:53<59:13, 5.12s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▌ | 325/1019 [25:53<59:13, 5.12s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2676, 'learning_rate': 6.42e-06, 'epoch': 0.32} 32%|█████████████████████████▌ | 326/1019 [25:58<58:40, 5.08s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▌ | 326/1019 [25:58<58:40, 5.08s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2007, 'learning_rate': 6.440000000000001e-06, 'epoch': 0.32} 32%|█████████████████████████▋ | 327/1019 [26:02<58:05, 5.04s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▋ | 327/1019 [26:02<58:05, 5.04s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1924, 'learning_rate': 6.460000000000001e-06, 'epoch': 0.32} 32%|█████████████████████████▊ | 328/1019 [26:07<57:28, 4.99s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▊ | 328/1019 [26:07<57:28, 4.99s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2115, 'learning_rate': 6.480000000000001e-06, 'epoch': 0.32} 32%|█████████████████████████▊ | 329/1019 [26:12<56:35, 4.92s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▊ | 329/1019 [26:12<56:35, 4.92s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2116, 'learning_rate': 6.5000000000000004e-06, 'epoch': 0.32} 32%|█████████████████████████▉ | 330/1019 [26:17<56:12, 4.89s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▉ | 330/1019 [26:17<56:12, 4.89s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2854, 'learning_rate': 6.520000000000001e-06, 'epoch': 0.32} 32%|█████████████████████████▉ | 331/1019 [26:22<55:46, 4.86s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 32%|█████████████████████████▉ | 331/1019 [26:22<55:46, 4.86s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1604, 'learning_rate': 6.540000000000001e-06, 'epoch': 0.32} 33%|██████████████████████████ | 332/1019 [26:26<55:05, 4.81s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████ | 332/1019 [26:26<55:05, 4.81s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3525, 'learning_rate': 6.560000000000001e-06, 'epoch': 0.33} 33%|██████████████████████████▏ | 333/1019 [26:31<54:23, 4.76s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▏ | 333/1019 [26:31<54:23, 4.76s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.012, 'learning_rate': 6.5800000000000005e-06, 'epoch': 0.33} 33%|██████████████████████████▏ | 334/1019 [26:36<53:34, 4.69s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▏ | 334/1019 [26:36<53:34, 4.69s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3344, 'learning_rate': 6.600000000000001e-06, 'epoch': 0.33} 33%|██████████████████████████▎ | 335/1019 [26:40<52:39, 4.62s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▎ | 335/1019 [26:40<52:39, 4.62s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3529, 'learning_rate': 6.620000000000001e-06, 'epoch': 0.33} 33%|██████████████████████████▍ | 336/1019 [26:44<51:28, 4.52s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▍ | 336/1019 [26:44<51:28, 4.52s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1345, 'learning_rate': 6.640000000000001e-06, 'epoch': 0.33} 33%|██████████████████████████▍ | 336/1019 [26:44<51:28, 4.52s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▍ | 337/1019 [26:49<50:39, 4.46s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▍ | 337/1019 [26:49<50:39, 4.46s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▍ | 337/1019 [26:49<50:39, 4.46s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▌ | 338/1019 [26:53<49:33, 4.37s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▌ | 338/1019 [26:53<49:33, 4.37s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▌ | 338/1019 [26:53<49:33, 4.37s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▌ | 339/1019 [26:57<48:15, 4.26s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▌ | 339/1019 [26:57<48:15, 4.26s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▌ | 339/1019 [26:57<48:15, 4.26s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▋ | 340/1019 [27:01<46:38, 4.12s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▋ | 340/1019 [27:01<46:38, 4.12s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▋ | 340/1019 [27:01<46:38, 4.12s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 33%|██████████████████████████▊ | 341/1019 [27:04<44:45, 3.96s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:59:52,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:59:52,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2346, 'learning_rate': 6.760000000000001e-06, 'epoch': 0.34} [WARNING|modeling_utils.py:388] 2022-02-28 22:59:52,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|██████████████████████████▉ | 343/1019 [27:11<40:40, 3.61s/it]g-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:59:58,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 22:59:58,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3741, 'learning_rate': 6.800000000000001e-06, 'epoch': 0.34} [WARNING|modeling_utils.py:388] 2022-02-28 22:59:58,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:56:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████ | 345/1019 [27:16<35:33, 3.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████ | 345/1019 [27:16<35:33, 3.17s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▏ | 346/1019 [27:19<32:55, 2.93s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:05,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:05,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:07,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:07,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4325, 'learning_rate': 6.88e-06, 'epoch': 0.34} [WARNING|modeling_utils.py:388] 2022-02-28 23:00:09,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:09,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:10,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:10,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:00:10,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▌ | 351/1019 [27:32<37:01, 3.33s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 34%|███████████████████████████▌ | 351/1019 [27:32<37:01, 3.33s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▋ | 352/1019 [27:38<45:30, 4.09s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▋ | 352/1019 [27:38<45:30, 4.09s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1596, 'learning_rate': 6.96e-06, 'epoch': 0.35} 35%|███████████████████████████▋ | 352/1019 [27:38<45:30, 4.09s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▋ | 353/1019 [27:44<51:02, 4.60s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▋ | 353/1019 [27:44<51:02, 4.60s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▊ | 354/1019 [27:50<54:53, 4.95s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▊ | 354/1019 [27:50<54:53, 4.95s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2482, 'learning_rate': 7e-06, 'epoch': 0.35} 35%|███████████████████████████▊ | 354/1019 [27:50<54:53, 4.95s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▊ | 355/1019 [27:56<57:23, 5.19s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▊ | 355/1019 [27:56<57:23, 5.19s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▊ | 355/1019 [27:56<57:23, 5.19s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▉ | 356/1019 [28:01<58:52, 5.33s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▉ | 356/1019 [28:01<58:52, 5.33s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▎ | 357/1019 [28:07<1:00:11, 5.46s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▎ | 357/1019 [28:07<1:00:11, 5.46s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1397, 'learning_rate': 7.06e-06, 'epoch': 0.35} 35%|███████████████████████████▍ | 358/1019 [28:13<1:00:42, 5.51s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▍ | 358/1019 [28:13<1:00:42, 5.51s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4043, 'learning_rate': 7.08e-06, 'epoch': 0.35} 35%|███████████████████████████▍ | 358/1019 [28:13<1:00:42, 5.51s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▍ | 359/1019 [28:18<1:01:05, 5.55s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▍ | 359/1019 [28:18<1:01:05, 5.55s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▍ | 359/1019 [28:18<1:01:05, 5.55s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▌ | 360/1019 [28:24<1:01:19, 5.58s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▌ | 360/1019 [28:24<1:01:19, 5.58s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▌ | 360/1019 [28:24<1:01:19, 5.58s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▋ | 361/1019 [28:29<1:00:58, 5.56s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 35%|███████████████████████████▋ | 361/1019 [28:29<1:00:58, 5.56s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|███████████████████████████▋ | 362/1019 [28:35<1:00:34, 5.53s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|███████████████████████████▋ | 362/1019 [28:35<1:00:34, 5.53s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3606, 'learning_rate': 7.16e-06, 'epoch': 0.36} 36%|███████████████████████████▋ | 362/1019 [28:35<1:00:34, 5.53s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|███████████████████████████▊ | 363/1019 [28:40<1:00:04, 5.49s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|███████████████████████████▊ | 363/1019 [28:40<1:00:04, 5.49s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▌ | 364/1019 [28:46<59:30, 5.45s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▌ | 364/1019 [28:46<59:30, 5.45s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2427, 'learning_rate': 7.2000000000000005e-06, 'epoch': 0.36} 36%|████████████████████████████▋ | 365/1019 [28:51<58:54, 5.40s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▋ | 365/1019 [28:51<58:54, 5.40s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1021, 'learning_rate': 7.22e-06, 'epoch': 0.36} 36%|████████████████████████████▋ | 366/1019 [28:56<58:49, 5.41s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▋ | 366/1019 [28:56<58:49, 5.41s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2122, 'learning_rate': 7.24e-06, 'epoch': 0.36} 36%|████████████████████████████▊ | 367/1019 [29:02<58:32, 5.39s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▊ | 367/1019 [29:02<58:32, 5.39s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0173, 'learning_rate': 7.260000000000001e-06, 'epoch': 0.36} 36%|████████████████████████████▉ | 368/1019 [29:07<58:17, 5.37s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▉ | 368/1019 [29:07<58:17, 5.37s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1963, 'learning_rate': 7.280000000000001e-06, 'epoch': 0.36} 36%|████████████████████████████▉ | 369/1019 [29:12<58:11, 5.37s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|████████████████████████████▉ | 369/1019 [29:12<58:11, 5.37s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2375, 'learning_rate': 7.3e-06, 'epoch': 0.36} 36%|█████████████████████████████ | 370/1019 [29:18<57:50, 5.35s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████ | 370/1019 [29:18<57:50, 5.35s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.091, 'learning_rate': 7.32e-06, 'epoch': 0.36} 36%|█████████████████████████████▏ | 371/1019 [29:23<56:57, 5.27s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 36%|█████████████████████████████▏ | 371/1019 [29:23<56:57, 5.27s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.424, 'learning_rate': 7.340000000000001e-06, 'epoch': 0.36} 37%|█████████████████████████████▏ | 372/1019 [29:28<56:04, 5.20s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▏ | 372/1019 [29:28<56:04, 5.20s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1969, 'learning_rate': 7.360000000000001e-06, 'epoch': 0.36} 37%|█████████████████████████████▏ | 372/1019 [29:28<56:04, 5.20s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▎ | 373/1019 [29:33<55:40, 5.17s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▎ | 373/1019 [29:33<55:40, 5.17s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▎ | 374/1019 [29:38<55:23, 5.15s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▎ | 374/1019 [29:38<55:23, 5.15s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0861, 'learning_rate': 7.4e-06, 'epoch': 0.37} 37%|█████████████████████████████▎ | 374/1019 [29:38<55:23, 5.15s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▍ | 375/1019 [29:43<54:53, 5.11s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▍ | 375/1019 [29:43<54:53, 5.11s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▌ | 376/1019 [29:48<54:21, 5.07s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▌ | 376/1019 [29:48<54:21, 5.07s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3203, 'learning_rate': 7.440000000000001e-06, 'epoch': 0.37} 37%|█████████████████████████████▌ | 376/1019 [29:48<54:21, 5.07s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▌ | 377/1019 [29:53<53:55, 5.04s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▌ | 377/1019 [29:53<53:55, 5.04s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▋ | 378/1019 [29:58<53:19, 4.99s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▋ | 378/1019 [29:58<53:19, 4.99s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1449, 'learning_rate': 7.48e-06, 'epoch': 0.37} 37%|█████████████████████████████▊ | 379/1019 [30:03<52:56, 4.96s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▊ | 379/1019 [30:03<52:56, 4.96s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3008, 'learning_rate': 7.500000000000001e-06, 'epoch': 0.37} 37%|█████████████████████████████▊ | 379/1019 [30:03<52:56, 4.96s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▊ | 380/1019 [30:08<52:32, 4.93s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▊ | 380/1019 [30:08<52:32, 4.93s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▉ | 381/1019 [30:12<51:56, 4.88s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▉ | 381/1019 [30:12<51:56, 4.88s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1243, 'learning_rate': 7.540000000000001e-06, 'epoch': 0.37} 37%|█████████████████████████████▉ | 382/1019 [30:17<51:26, 4.85s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 37%|█████████████████████████████▉ | 382/1019 [30:17<51:26, 4.85s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2132, 'learning_rate': 7.5600000000000005e-06, 'epoch': 0.37} 37%|█████████████████████████████▉ | 382/1019 [30:17<51:26, 4.85s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████ | 383/1019 [30:22<50:47, 4.79s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████ | 383/1019 [30:22<50:47, 4.79s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████ | 383/1019 [30:22<50:47, 4.79s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▏ | 384/1019 [30:26<50:05, 4.73s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▏ | 384/1019 [30:26<50:05, 4.73s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▏ | 385/1019 [30:31<49:15, 4.66s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▏ | 385/1019 [30:31<49:15, 4.66s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3407, 'learning_rate': 7.620000000000001e-06, 'epoch': 0.38} 38%|██████████████████████████████▎ | 386/1019 [30:35<48:18, 4.58s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▎ | 386/1019 [30:35<48:18, 4.58s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2512, 'learning_rate': 7.640000000000001e-06, 'epoch': 0.38} 38%|██████████████████████████████▍ | 387/1019 [30:40<47:16, 4.49s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▍ | 387/1019 [30:40<47:16, 4.49s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0738, 'learning_rate': 7.660000000000001e-06, 'epoch': 0.38} 38%|██████████████████████████████▍ | 388/1019 [30:44<46:13, 4.40s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▍ | 388/1019 [30:44<46:13, 4.40s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.297, 'learning_rate': 7.680000000000001e-06, 'epoch': 0.38} 38%|██████████████████████████████▌ | 389/1019 [30:48<45:09, 4.30s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▌ | 389/1019 [30:48<45:09, 4.30s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0619, 'learning_rate': 7.7e-06, 'epoch': 0.38} 38%|██████████████████████████████▌ | 390/1019 [30:52<43:37, 4.16s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▌ | 390/1019 [30:52<43:37, 4.16s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2844, 'learning_rate': 7.72e-06, 'epoch': 0.38} 38%|██████████████████████████████▋ | 391/1019 [30:55<41:35, 3.97s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▋ | 391/1019 [30:55<41:35, 3.97s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1876, 'learning_rate': 7.74e-06, 'epoch': 0.38} 38%|██████████████████████████████▋ | 391/1019 [30:55<41:35, 3.97s/it]g-point operations will not be computed-28 23:00:02,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 38%|██████████████████████████████▊ | 392/1019 [30:59<39:48, 3.81s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:45,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|██████████████████████████████▊ | 393/1019 [31:02<37:44, 3.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:45,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|██████████████████████████████▊ | 393/1019 [31:02<37:44, 3.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:45,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2759, 'learning_rate': 7.78e-06, 'epoch': 0.39} 39%|██████████████████████████████▊ | 393/1019 [31:02<37:44, 3.62s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:45,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|██████████████████████████████▉ | 394/1019 [31:05<35:46, 3.43s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:50,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████ | 395/1019 [31:07<33:05, 3.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████ | 395/1019 [31:07<33:05, 3.18s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████ | 396/1019 [31:10<30:20, 2.92s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████ | 396/1019 [31:10<30:20, 2.92s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:03:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:03:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:03:58,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:03:58,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:03:59,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:03:59,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:04:01,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:04:01,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2588, 'learning_rate': 7.92e-06, 'epoch': 0.39} 39%|███████████████████████████████▍ | 401/1019 [31:23<34:02, 3.31s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▍ | 401/1019 [31:23<34:02, 3.31s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1771, 'learning_rate': 7.94e-06, 'epoch': 0.39} 39%|███████████████████████████████▍ | 401/1019 [31:23<34:02, 3.31s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▌ | 402/1019 [31:29<41:54, 4.07s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▌ | 402/1019 [31:29<41:54, 4.07s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 39%|███████████████████████████████▌ | 402/1019 [31:29<41:54, 4.07s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▋ | 403/1019 [31:35<47:03, 4.58s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▋ | 403/1019 [31:35<47:03, 4.58s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▋ | 403/1019 [31:35<47:03, 4.58s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▋ | 404/1019 [31:41<50:12, 4.90s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▋ | 404/1019 [31:41<50:12, 4.90s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▋ | 404/1019 [31:41<50:12, 4.90s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▊ | 405/1019 [31:46<52:39, 5.15s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▊ | 405/1019 [31:46<52:39, 5.15s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▊ | 405/1019 [31:46<52:39, 5.15s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▊ | 405/1019 [31:46<52:39, 5.15s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0577, 'learning_rate': 8.040000000000001e-06, 'epoch': 0.4} 40%|███████████████████████████████▊ | 405/1019 [31:46<52:39, 5.15s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▉ | 407/1019 [31:58<55:10, 5.41s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|███████████████████████████████▉ | 407/1019 [31:58<55:10, 5.41s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0773, 'learning_rate': 8.06e-06, 'epoch': 0.4} 40%|████████████████████████████████ | 408/1019 [32:03<55:42, 5.47s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████ | 408/1019 [32:03<55:42, 5.47s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0899, 'learning_rate': 8.08e-06, 'epoch': 0.4} 40%|████████████████████████████████ | 408/1019 [32:03<55:42, 5.47s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████ | 409/1019 [32:09<56:12, 5.53s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████ | 409/1019 [32:09<56:12, 5.53s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████ | 409/1019 [32:09<56:12, 5.53s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████ | 409/1019 [32:09<56:12, 5.53s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1628, 'learning_rate': 8.120000000000002e-06, 'epoch': 0.4} 40%|████████████████████████████████ | 409/1019 [32:09<56:12, 5.53s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 411/1019 [32:20<55:48, 5.51s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 411/1019 [32:20<55:48, 5.51s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2213, 'learning_rate': 8.14e-06, 'epoch': 0.4} 40%|████████████████████████████████▎ | 411/1019 [32:20<55:48, 5.51s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 412/1019 [32:25<55:31, 5.49s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 40%|████████████████████████████████▎ | 412/1019 [32:25<55:31, 5.49s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▍ | 413/1019 [32:31<55:22, 5.48s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▍ | 413/1019 [32:31<55:22, 5.48s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.031, 'learning_rate': 8.18e-06, 'epoch': 0.41} 41%|████████████████████████████████▌ | 414/1019 [32:36<55:07, 5.47s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▌ | 414/1019 [32:36<55:07, 5.47s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3586, 'learning_rate': 8.2e-06, 'epoch': 0.41} 41%|████████████████████████████████▌ | 414/1019 [32:36<55:07, 5.47s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▌ | 415/1019 [32:42<54:40, 5.43s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▌ | 415/1019 [32:42<54:40, 5.43s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▋ | 416/1019 [32:47<54:29, 5.42s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▋ | 416/1019 [32:47<54:29, 5.42s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0652, 'learning_rate': 8.24e-06, 'epoch': 0.41} 41%|████████████████████████████████▋ | 416/1019 [32:47<54:29, 5.42s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▋ | 417/1019 [32:52<54:01, 5.38s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▋ | 417/1019 [32:52<54:01, 5.38s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▊ | 418/1019 [32:58<53:51, 5.38s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▊ | 418/1019 [32:58<53:51, 5.38s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3404, 'learning_rate': 8.28e-06, 'epoch': 0.41} 41%|████████████████████████████████▉ | 419/1019 [33:03<53:24, 5.34s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▉ | 419/1019 [33:03<53:24, 5.34s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2375, 'learning_rate': 8.3e-06, 'epoch': 0.41} 41%|████████████████████████████████▉ | 420/1019 [33:08<52:58, 5.31s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|████████████████████████████████▉ | 420/1019 [33:08<52:58, 5.31s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3139, 'learning_rate': 8.32e-06, 'epoch': 0.41} 41%|█████████████████████████████████ | 421/1019 [33:13<52:46, 5.30s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 41%|█████████████████████████████████ | 421/1019 [33:13<52:46, 5.30s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1704, 'learning_rate': 8.34e-06, 'epoch': 0.41} g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4097, 'learning_rate': 8.36e-06, 'epoch': 0.41} g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▏ | 423/1019 [33:24<51:44, 5.21s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▏ | 423/1019 [33:24<51:44, 5.21s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1312, 'learning_rate': 8.380000000000001e-06, 'epoch': 0.41} 42%|█████████████████████████████████▎ | 424/1019 [33:29<51:12, 5.16s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▎ | 424/1019 [33:29<51:12, 5.16s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2479, 'learning_rate': 8.400000000000001e-06, 'epoch': 0.42} 42%|█████████████████████████████████▎ | 425/1019 [33:34<50:45, 5.13s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▎ | 425/1019 [33:34<50:45, 5.13s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1687, 'learning_rate': 8.42e-06, 'epoch': 0.42} 42%|█████████████████████████████████▍ | 426/1019 [33:39<50:12, 5.08s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▍ | 426/1019 [33:39<50:12, 5.08s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2207, 'learning_rate': 8.44e-06, 'epoch': 0.42} 42%|█████████████████████████████████▍ | 426/1019 [33:39<50:12, 5.08s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▌ | 427/1019 [33:44<49:50, 5.05s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▌ | 427/1019 [33:44<49:50, 5.05s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▌ | 428/1019 [33:49<49:28, 5.02s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▌ | 428/1019 [33:49<49:28, 5.02s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2921, 'learning_rate': 8.48e-06, 'epoch': 0.42} 42%|█████████████████████████████████▋ | 429/1019 [33:53<48:50, 4.97s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▋ | 429/1019 [33:53<48:50, 4.97s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2933, 'learning_rate': 8.5e-06, 'epoch': 0.42} 42%|█████████████████████████████████▊ | 430/1019 [33:58<48:14, 4.91s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▊ | 430/1019 [33:58<48:14, 4.91s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.33, 'learning_rate': 8.52e-06, 'epoch': 0.42} 42%|█████████████████████████████████▊ | 431/1019 [34:03<47:48, 4.88s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▊ | 431/1019 [34:03<47:48, 4.88s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2059, 'learning_rate': 8.540000000000001e-06, 'epoch': 0.42} 42%|█████████████████████████████████▉ | 432/1019 [34:08<47:04, 4.81s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 432/1019 [34:08<47:04, 4.81s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2752, 'learning_rate': 8.560000000000001e-06, 'epoch': 0.42} 42%|█████████████████████████████████▉ | 433/1019 [34:12<46:25, 4.75s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 42%|█████████████████████████████████▉ | 433/1019 [34:12<46:25, 4.75s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2522, 'learning_rate': 8.580000000000001e-06, 'epoch': 0.42} 43%|██████████████████████████████████ | 434/1019 [34:17<45:35, 4.68s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████ | 434/1019 [34:17<45:35, 4.68s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2236, 'learning_rate': 8.6e-06, 'epoch': 0.43} 43%|██████████████████████████████████▏ | 435/1019 [34:21<44:45, 4.60s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▏ | 435/1019 [34:21<44:45, 4.60s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1062, 'learning_rate': 8.62e-06, 'epoch': 0.43} 43%|██████████████████████████████████▏ | 436/1019 [34:26<43:55, 4.52s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▏ | 436/1019 [34:26<43:55, 4.52s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2164, 'learning_rate': 8.64e-06, 'epoch': 0.43} 43%|██████████████████████████████████▏ | 436/1019 [34:26<43:55, 4.52s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▎ | 437/1019 [34:30<43:14, 4.46s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▎ | 437/1019 [34:30<43:14, 4.46s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▎ | 437/1019 [34:30<43:14, 4.46s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▍ | 438/1019 [34:34<42:21, 4.37s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▍ | 438/1019 [34:34<42:21, 4.37s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▍ | 438/1019 [34:34<42:21, 4.37s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▍ | 439/1019 [34:38<41:18, 4.27s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▍ | 439/1019 [34:38<41:18, 4.27s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▍ | 439/1019 [34:38<41:18, 4.27s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 440/1019 [34:42<40:06, 4.16s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 440/1019 [34:42<40:06, 4.16s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 440/1019 [34:42<40:06, 4.16s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 441/1019 [34:46<38:50, 4.03s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 441/1019 [34:46<38:50, 4.03s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▌ | 441/1019 [34:46<38:50, 4.03s/it]g-point operations will not be computed-28 23:03:53,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 442/1019 [34:49<37:26, 3.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:35,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▋ | 442/1019 [34:49<37:26, 3.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:35,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 443/1019 [34:53<35:36, 3.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:35,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 443/1019 [34:53<35:36, 3.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:35,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 43%|██████████████████████████████████▊ | 443/1019 [34:53<35:36, 3.71s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:35,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|██████████████████████████████████▊ | 444/1019 [34:56<33:44, 3.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|██████████████████████████████████▊ | 444/1019 [34:56<33:44, 3.52s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|██████████████████████████████████▉ | 445/1019 [34:58<31:33, 3.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|██████████████████████████████████▉ | 445/1019 [34:58<31:33, 3.30s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:45,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:45,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:47,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:47,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:49,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:49,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:51,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:51,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:53,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:53,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:07:53,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 451/1019 [35:15<32:13, 3.40s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 451/1019 [35:15<32:13, 3.40s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 451/1019 [35:15<32:13, 3.40s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 452/1019 [35:21<39:31, 4.18s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 452/1019 [35:21<39:31, 4.18s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▍ | 452/1019 [35:21<39:31, 4.18s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▌ | 453/1019 [35:27<44:21, 4.70s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▌ | 453/1019 [35:27<44:21, 4.70s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 44%|███████████████████████████████████▌ | 453/1019 [35:27<44:21, 4.70s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▋ | 454/1019 [35:33<47:35, 5.05s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▋ | 454/1019 [35:33<47:35, 5.05s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▋ | 454/1019 [35:33<47:35, 5.05s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▋ | 455/1019 [35:38<49:31, 5.27s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▋ | 455/1019 [35:38<49:31, 5.27s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▋ | 455/1019 [35:38<49:31, 5.27s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▊ | 456/1019 [35:44<50:41, 5.40s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▊ | 456/1019 [35:44<50:41, 5.40s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▉ | 457/1019 [35:50<51:33, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▉ | 457/1019 [35:50<51:33, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0692, 'learning_rate': 9.060000000000001e-06, 'epoch': 0.45} 45%|███████████████████████████████████▉ | 457/1019 [35:50<51:33, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▉ | 458/1019 [35:56<52:00, 5.56s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▉ | 458/1019 [35:56<52:00, 5.56s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|███████████████████████████████████▉ | 458/1019 [35:56<52:00, 5.56s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████ | 459/1019 [36:01<52:04, 5.58s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████ | 459/1019 [36:01<52:04, 5.58s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████ | 459/1019 [36:01<52:04, 5.58s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████ | 460/1019 [36:07<51:51, 5.57s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████ | 460/1019 [36:07<51:51, 5.57s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▏ | 461/1019 [36:12<51:56, 5.58s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▏ | 461/1019 [36:12<51:56, 5.58s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1178, 'learning_rate': 9.14e-06, 'epoch': 0.45} 45%|████████████████████████████████████▏ | 461/1019 [36:12<51:56, 5.58s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 462/1019 [36:18<51:36, 5.56s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 462/1019 [36:18<51:36, 5.56s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 462/1019 [36:18<51:36, 5.56s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 463/1019 [36:23<51:18, 5.54s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 45%|████████████████████████████████████▎ | 463/1019 [36:23<51:18, 5.54s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▍ | 464/1019 [36:29<51:08, 5.53s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▍ | 464/1019 [36:29<51:08, 5.53s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0738, 'learning_rate': 9.200000000000002e-06, 'epoch': 0.46} 46%|████████████████████████████████████▍ | 464/1019 [36:29<51:08, 5.53s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▌ | 465/1019 [36:34<50:46, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▌ | 465/1019 [36:34<50:46, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▌ | 465/1019 [36:34<50:46, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▌ | 465/1019 [36:34<50:46, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1812, 'learning_rate': 9.240000000000001e-06, 'epoch': 0.46} 46%|████████████████████████████████████▌ | 465/1019 [36:34<50:46, 5.50s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▋ | 467/1019 [36:45<50:13, 5.46s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▋ | 467/1019 [36:45<50:13, 5.46s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.256, 'learning_rate': 9.260000000000001e-06, 'epoch': 0.46} 46%|████████████████████████████████████▋ | 467/1019 [36:45<50:13, 5.46s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▋ | 468/1019 [36:51<49:57, 5.44s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▋ | 468/1019 [36:51<49:57, 5.44s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▊ | 469/1019 [36:56<49:26, 5.39s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▊ | 469/1019 [36:56<49:26, 5.39s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 3.9931, 'learning_rate': 9.3e-06, 'epoch': 0.46} 46%|████████████████████████████████████▊ | 469/1019 [36:56<49:26, 5.39s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 470/1019 [37:01<48:59, 5.35s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 470/1019 [37:01<48:59, 5.35s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 471/1019 [37:06<48:41, 5.33s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|████████████████████████████████████▉ | 471/1019 [37:06<48:41, 5.33s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2565, 'learning_rate': 9.340000000000002e-06, 'epoch': 0.46} 46%|█████████████████████████████████████ | 472/1019 [37:12<48:21, 5.30s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████ | 472/1019 [37:12<48:21, 5.30s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3811, 'learning_rate': 9.360000000000002e-06, 'epoch': 0.46} 46%|█████████████████████████████████████▏ | 473/1019 [37:17<48:01, 5.28s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 46%|█████████████████████████████████████▏ | 473/1019 [37:17<48:01, 5.28s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2838, 'learning_rate': 9.38e-06, 'epoch': 0.46} 47%|█████████████████████████████████████▏ | 474/1019 [37:22<47:50, 5.27s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▏ | 474/1019 [37:22<47:50, 5.27s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1582, 'learning_rate': 9.4e-06, 'epoch': 0.46} 47%|█████████████████████████████████████▏ | 474/1019 [37:22<47:50, 5.27s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▎ | 475/1019 [37:27<47:20, 5.22s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▎ | 475/1019 [37:27<47:20, 5.22s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▎ | 475/1019 [37:27<47:20, 5.22s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▎ | 476/1019 [37:32<46:54, 5.18s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▎ | 476/1019 [37:32<46:54, 5.18s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▍ | 477/1019 [37:37<46:24, 5.14s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▍ | 477/1019 [37:37<46:24, 5.14s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0656, 'learning_rate': 9.460000000000001e-06, 'epoch': 0.47} 47%|█████████████████████████████████████▍ | 477/1019 [37:37<46:24, 5.14s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▌ | 478/1019 [37:42<45:54, 5.09s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▌ | 478/1019 [37:42<45:54, 5.09s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▌ | 478/1019 [37:42<45:54, 5.09s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▌ | 479/1019 [37:47<45:31, 5.06s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▌ | 479/1019 [37:47<45:31, 5.06s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▋ | 480/1019 [37:52<45:21, 5.05s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▋ | 480/1019 [37:52<45:21, 5.05s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2368, 'learning_rate': 9.52e-06, 'epoch': 0.47} 47%|█████████████████████████████████████▋ | 480/1019 [37:52<45:21, 5.05s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 481/1019 [37:57<44:47, 4.99s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 481/1019 [37:57<44:47, 4.99s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 481/1019 [37:57<44:47, 4.99s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 482/1019 [38:02<44:18, 4.95s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 482/1019 [38:02<44:18, 4.95s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▊ | 482/1019 [38:02<44:18, 4.95s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▉ | 483/1019 [38:07<43:51, 4.91s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▉ | 483/1019 [38:07<43:51, 4.91s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▉ | 483/1019 [38:07<43:51, 4.91s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▉ | 484/1019 [38:12<43:18, 4.86s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▉ | 484/1019 [38:12<43:18, 4.86s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 47%|█████████████████████████████████████▉ | 484/1019 [38:12<43:18, 4.86s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████ | 485/1019 [38:16<42:26, 4.77s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████ | 485/1019 [38:16<42:26, 4.77s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▏ | 486/1019 [38:21<41:42, 4.69s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▏ | 486/1019 [38:21<41:42, 4.69s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.4885, 'learning_rate': 9.640000000000001e-06, 'epoch': 0.48} 48%|██████████████████████████████████████▏ | 487/1019 [38:25<40:52, 4.61s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▏ | 487/1019 [38:25<40:52, 4.61s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.3354, 'learning_rate': 9.66e-06, 'epoch': 0.48} 48%|██████████████████████████████████████▏ | 487/1019 [38:25<40:52, 4.61s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▏ | 487/1019 [38:25<40:52, 4.61s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.0316, 'learning_rate': 9.68e-06, 'epoch': 0.48} 48%|██████████████████████████████████████▍ | 489/1019 [38:34<39:08, 4.43s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▍ | 489/1019 [38:34<39:08, 4.43s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.1756, 'learning_rate': 9.7e-06, 'epoch': 0.48} 48%|██████████████████████████████████████▍ | 490/1019 [38:38<38:04, 4.32s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▍ | 490/1019 [38:38<38:04, 4.32s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.123, 'learning_rate': 9.72e-06, 'epoch': 0.48} 48%|██████████████████████████████████████▌ | 491/1019 [38:42<36:50, 4.19s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▌ | 491/1019 [38:42<36:50, 4.19s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2564, 'learning_rate': 9.74e-06, 'epoch': 0.48} 48%|██████████████████████████████████████▋ | 492/1019 [38:45<35:06, 4.00s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 48%|██████████████████████████████████████▋ | 492/1019 [38:45<35:06, 4.00s/it]g-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:33,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:33,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2812, 'learning_rate': 9.780000000000001e-06, 'epoch': 0.48} [WARNING|modeling_utils.py:388] 2022-02-28 23:11:36,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:36,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 4.2238, 'learning_rate': 9.800000000000001e-06, 'epoch': 0.48} [WARNING|modeling_utils.py:388] 2022-02-28 23:11:36,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:07:41,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|██████████████████████████████████████▊ | 495/1019 [38:54<29:23, 3.37s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:11:40,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|██████████████████████████████████████▉ | 496/1019 [38:57<27:16, 3.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|██████████████████████████████████████▉ | 496/1019 [38:57<27:16, 3.13s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████ | 497/1019 [38:59<25:08, 2.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 49%|███████████████████████████████████████ | 497/1019 [38:59<25:08, 2.89s/it][WARNING|modeling_utils.py:388] 2022-02-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:46,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:46,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:47,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_utils.py:388] 2022-02-28 23:11:47,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [INFO|trainer.py:2369] 2022-02-28 23:11:49,922 >> Batch size = 14luation *****e number of tokens of the input, floating-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [INFO|trainer.py:2369] 2022-02-28 23:11:49,922 >> Batch size = 14luation *****e number of tokens of the input, floating-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 0/189 [00:00> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▉ | 2/189 [00:02<04:02, 1.29s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▎ | 3/189 [00:06<06:55, 2.23s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 2%|█▊ | 4/189 [00:09<07:44, 2.51s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▏ | 5/189 [00:12<08:54, 2.91s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 3%|██▋ | 6/189 [00:16<10:07, 3.32s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███ | 7/189 [00:20<09:59, 3.29s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 4%|███▌ | 8/189 [00:23<09:51, 3.27s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|███▉ | 9/189 [00:28<11:10, 3.73s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 5%|████▎ | 10/189 [00:32<11:24, 3.82s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 6%|████▊ | 11/189 [00:35<10:51, 3.66s/it]g-point operations will not be computed-28 23:11:42,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.78 GiB total capacity; 9.19 GiB already allocated; 1.65 GiB free; 12.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.78 GiB total capacity; 9.19 GiB already allocated; 1.65 GiB free; 12.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.78 GiB total capacity; 9.19 GiB already allocated; 1.65 GiB free; 12.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF