Reading metadata...: 2165it [00:00, 5074.17it/s] | 0/60000 [00:00> The following columns in the training set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. Reading metadata...: 1650it [00:00, 9922.55it/s] Reading metadata...: 2165it [00:00, 9148.71it/s] Reading metadata...: 1650it [00:00, 9978.95it/s] Reading metadata...: 2165it [00:00, 3385.77it/s] Reading metadata...: 1650it [00:00, 2859.90it/s] Reading metadata...: 2165it [00:00, 13071.50it/s] Reading metadata...: 1650it [00:00, 10561.77it/s] Reading metadata...: 1650it [00:00, 10692.94it/s] Reading metadata...: 2165it [00:00, 12446.52it/s] Reading metadata...: 1650it [00:00, 10263.01it/s] Reading metadata...: 2165it [00:00, 12827.65it/s] Reading metadata...: 1650it [00:00, 7121.46it/s] [WARNING|logging.py:329] 2023-11-19 12:43:04,273 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... 12%|██████████▉ | 7020/60000 [1:48:51<54:19:02, 3.69s/it] 12%|██████████▉ | 7039/60000 [1:52:10<145:26:57, 9.89s/it] 12%|██████████▉ | 7060/60000 [1:55:35<141:19:25, 9.61s/it] 12%|██████████▉ | 7079/60000 [1:58:40<141:30:54, 9.63s/it] 12%|███████████ | 7099/60000 [2:01:58<147:19:24, 10.03s/it] 12%|███████████ | 7119/60000 [2:05:52<155:05:31, 10.56s/it] 12%|███████████ | 7139/60000 [2:09:07<140:51:36, 9.59s/it] 12%|███████████ | 7159/60000 [2:12:27<147:51:28, 10.07s/it] 12%|███████████▏ | 7179/60000 [2:15:47<148:51:34, 10.15s/it] 12%|███████████▏ | 7184/60000 [2:16:38<151:38:50, 10.34s/it] 12%|███████████▏ | 7185/60000 [2:16:46<137:41:51, 9.39s/it] 12%|███████████▏ | 7199/60000 [2:19:03<146:31:49, 9.99s/it] 12%|███████████▏ | 7219/60000 [2:22:23<143:09:26, 9.76s/it] 12%|███████████▏ | 7239/60000 [2:25:57<145:28:31, 9.93s/it] 12%|███████████▎ | 7260/60000 [2:29:21<143:56:42, 9.83s/it] 12%|███████████▎ | 7280/60000 [2:32:34<139:37:45, 9.53s/it] 12%|███████████▎ | 7300/60000 [2:36:04<159:57:25, 10.93s/it] 12%|███████████▎ | 7320/60000 [2:39:18<140:15:43, 9.59s/it] 12%|███████████▍ | 7339/60000 [2:42:47<143:25:48, 9.81s/it] 12%|███████████▍ | 7359/60000 [2:46:10<153:27:55, 10.50s/it] 12%|███████████▍ | 7379/60000 [2:49:43<154:20:08, 10.56s/it] 12%|███████████▍ | 7400/60000 [2:53:23<149:40:11, 10.24s/it] 12%|███████████▍ | 7419/60000 [2:56:47<163:53:38, 11.22s/it] 12%|███████████▌ | 7440/60000 [3:00:26<150:13:54, 10.29s/it] 12%|███████████▌ | 7460/60000 [3:03:50<145:27:29, 9.97s/it] 12%|███████████▌ | 7479/60000 [3:07:06<144:05:21, 9.88s/it] 12%|███████████▌ | 7499/60000 [3:10:39<152:35:11, 10.46s/it] 13%|███████████▋ | 7519/60000 [3:14:05<146:11:49, 10.03s/it] 13%|███████████▋ | 7540/60000 [3:17:37<144:43:39, 9.93s/it] 13%|███████████▋ | 7560/60000 [3:20:54<144:28:51, 9.92s/it] 13%|███████████▋ | 7579/60000 [3:24:18<194:26:36, 13.35s/it] 13%|███████████▊ | 7599/60000 [3:27:40<141:47:50, 9.74s/it] 13%|███████████▊ | 7619/60000 [3:30:59<142:47:21, 9.81s/it] 13%|███████████▊ | 7639/60000 [3:34:26<142:47:25, 9.82s/it] 13%|███████████▊ | 7660/60000 [3:37:55<143:03:54, 9.84s/it] 13%|███████████▉ | 7680/60000 [3:41:13<142:01:17, 9.77s/it] 13%|███████████▉ | 7699/60000 [3:44:24<149:41:03, 10.30s/it] 13%|███████████▉ | 7720/60000 [3:47:55<142:07:32, 9.79s/it] 13%|███████████▉ | 7740/60000 [3:51:22<150:36:37, 10.38s/it] 13%|████████████ | 7760/60000 [3:54:39<143:12:59, 9.87s/it] Reading metadata...: 2165it [00:00, 11232.32it/s] | 7764/60000 [3:55:18<141:36:50, 9.76s/it] 13%|████████████ | 7780/60000 [3:57:58<142:23:04, 9.82s/it] 13%|████████████ | 7800/60000 [4:01:16<143:48:28, 9.92s/it] 13%|████████████ | 7819/60000 [4:04:23<141:22:38, 9.75s/it] 13%|████████████▏ | 7840/60000 [4:07:55<141:49:37, 9.79s/it] 13%|████████████▏ | 7860/60000 [4:11:13<144:03:14, 9.95s/it] 13%|████████████▏ | 7880/60000 [4:14:31<143:04:24, 9.88s/it] Reading metadata...: 1650it [00:00, 8071.81it/s] | 7890/60000 [4:16:11<144:04:40, 9.95s/it] 13%|████████████▏ | 7899/60000 [4:17:43<143:24:54, 9.91s/it] 13%|████████████▎ | 7920/60000 [4:21:16<144:10:07, 9.97s/it] 13%|████████████▎ | 7940/60000 [4:24:37<147:04:09, 10.17s/it] 13%|████████████▎ | 7960/60000 [4:27:56<141:30:53, 9.79s/it] 13%|████████████▎ | 7980/60000 [4:31:22<143:06:47, 9.90s/it] 13%|████████████▍ | 8000/60000 [4:34:47<153:40:03, 10.64s/it][INFO|trainer.py:3173] 2023-11-19 15:34:02,297 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-19 15:34:02,297 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-19 15:34:02,297 >> Batch size = 4 Reading metadata...: 1704it [00:00, 9642.87it/s] Reading metadata...: 1it [00:00, 5.99it/s] [INFO|trainer_utils.py:759] 2023-11-19 15:34:03,285 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 13%|████████████▍ | 8000/60000 [4:45:00<153:40:03, 10.64s/it] 13%|████████████▍ | 8000/60000 [4:45:00<153:40:03, 10.64s/it][INFO|trainer.py:2896] 2023-11-19 15:44:16,572 >> Saving model checkpoint to ./checkpoint-8000 [INFO|configuration_utils.py:462] 2023-11-19 15:44:16,582 >> Configuration saved in ./checkpoint-8000/config.json [INFO|configuration_utils.py:568] 2023-11-19 15:44:16,586 >> Configuration saved in ./checkpoint-8000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-19 15:44:35,825 >> Model weights saved in ./checkpoint-8000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-19 15:44:35,831 >> Feature extractor saved in ./checkpoint-8000/preprocessor_config.json [2023-11-19 15:44:40,584] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step8000 is about to be saved! [2023-11-19 15:44:40,619] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-8000/global_step8000/mp_rank_00_model_states.pt [2023-11-19 15:44:40,620] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-8000/global_step8000/mp_rank_00_model_states.pt... [2023-11-19 15:45:19,962] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-8000/global_step8000/mp_rank_00_model_states.pt. [2023-11-19 15:45:20,051] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-19 15:45:45,610] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-19 15:45:45,626] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-19 15:45:45,632] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step8000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-19 15:46:46,902 >> Feature extractor saved in ./preprocessor_config.json 13%|████████████▍ | 8020/60000 [4:50:58<146:46:26, 10.17s/it] 13%|████████████▍ | 8039/60000 [4:54:06<141:10:33, 9.78s/it] 13%|████████████▍ | 8059/60000 [4:57:57<199:07:18, 13.80s/it] 13%|████████████▌ | 8079/60000 [5:01:12<137:59:55, 9.57s/it] 13%|████████████▌ | 8099/60000 [5:04:44<141:41:08, 9.83s/it] 14%|████████████▌ | 8119/60000 [5:08:07<145:13:16, 10.08s/it] 14%|████████████▌ | 8139/60000 [5:11:51<187:32:02, 13.02s/it] 14%|████████████▋ | 8159/60000 [5:15:10<140:39:12, 9.77s/it] 14%|████████████▋ | 8179/60000 [5:18:28<141:24:04, 9.82s/it] 14%|████████████▋ | 8186/60000 [5:19:37<141:50:31, 9.86s/it] 14%|████████████▋ | 8187/60000 [5:19:43<127:16:19, 8.84s/it] 14%|████████████▋ | 8199/60000 [5:21:35<136:36:17, 9.49s/it] 14%|████████████▋ | 8219/60000 [5:24:47<137:18:55, 9.55s/it] 14%|████████████▊ | 8239/60000 [5:28:14<166:10:17, 11.56s/it] 14%|████████████▊ | 8259/60000 [5:31:30<140:56:08, 9.81s/it] 14%|████████████▊ | 8279/60000 [5:34:46<136:44:33, 9.52s/it] 14%|████████████▊ | 8299/60000 [5:38:12<163:33:15, 11.39s/it] 14%|████████████▉ | 8320/60000 [5:41:35<138:53:27, 9.68s/it] 14%|████████████▉ | 8339/60000 [5:44:37<136:19:21, 9.50s/it] 14%|████████████▉ | 8359/60000 [5:47:54<137:59:34, 9.62s/it] 14%|████████████▉ | 8379/60000 [5:51:07<139:22:49, 9.72s/it] 14%|█████████████ | 8400/60000 [5:54:27<137:15:03, 9.58s/it] 14%|█████████████ | 8419/60000 [5:57:31<135:38:06, 9.47s/it] 14%|█████████████ | 8439/60000 [6:00:49<154:22:19, 10.78s/it] 14%|█████████████ | 8459/60000 [6:04:01<136:56:17, 9.56s/it] 14%|█████████████▏ | 8479/60000 [6:07:14<137:29:35, 9.61s/it] 14%|█████████████▏ | 8499/60000 [6:10:28<140:54:19, 9.85s/it] 14%|█████████████▏ | 8520/60000 [6:13:51<140:34:02, 9.83s/it] 14%|█████████████▏ | 8540/60000 [6:17:21<136:48:57, 9.57s/it] 14%|█████████████▎ | 8559/60000 [6:20:24<137:45:11, 9.64s/it] 14%|█████████████▎ | 8580/60000 [6:23:45<138:07:42, 9.67s/it] 14%|█████████████▎ | 8594/60000 [6:25:58<123:58:31, 8.68s/it] 14%|█████████████▎ | 8599/60000 [6:26:47<136:29:25, 9.56s/it] 14%|█████████████▎ | 8619/60000 [6:30:05<144:26:38, 10.12s/it] 14%|█████████████▍ | 8639/60000 [6:33:27<137:50:45, 9.66s/it] 14%|█████████████▍ | 8660/60000 [6:36:48<134:50:43, 9.46s/it] 14%|█████████████▍ | 8679/60000 [6:39:50<136:05:44, 9.55s/it] 14%|█████████████▍ | 8699/60000 [6:43:04<140:09:38, 9.84s/it] 15%|█████████████▌ | 8719/60000 [6:46:22<142:20:45, 9.99s/it] 15%|█████████████▌ | 8739/60000 [6:49:35<139:38:15, 9.81s/it] 15%|█████████████▌ | 8759/60000 [6:52:50<141:23:58, 9.93s/it] 15%|█████████████▌ | 8779/60000 [6:56:03<134:53:33, 9.48s/it] 15%|█████████████▋ | 8799/60000 [6:59:46<149:59:39, 10.55s/it] 15%|█████████████▋ | 8819/60000 [7:03:00<137:06:16, 9.64s/it] 15%|█████████████▋ | 8839/60000 [7:06:14<137:54:27, 9.70s/it] 15%|█████████████▋ | 8859/60000 [7:09:29<138:16:07, 9.73s/it] 15%|█████████████▊ | 8879/60000 [7:12:49<158:57:19, 11.19s/it] Reading metadata...: 1650it [00:00, 9158.08it/s] | 8880/60000 [7:12:58<151:08:54, 10.64s/it] 15%|█████████████▊ | 8900/60000 [7:16:50<139:26:23, 9.82s/it] 15%|█████████████▊ | 8919/60000 [7:19:57<142:39:26, 10.05s/it] 15%|█████████████▊ | 8939/60000 [7:23:11<139:14:39, 9.82s/it] 15%|█████████████▉ | 8959/60000 [7:26:26<136:56:34, 9.66s/it] 15%|█████████████▉ | 8979/60000 [7:29:55<140:01:11, 9.88s/it] 15%|█████████████▉ | 8999/60000 [7:33:22<139:13:16, 9.83s/it] 15%|█████████████▉ | 9000/60000 [7:33:32<140:19:23, 9.91s/it][INFO|trainer.py:3173] 2023-11-19 18:32:46,803 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-19 18:32:46,803 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-19 18:32:46,803 >> Batch size = 4 [INFO|trainer_utils.py:759] 2023-11-19 18:32:49,705 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 15%|█████████████▉ | 9000/60000 [7:43:57<140:19:23, 9.91s/it] 15%|█████████████▉ | 9000/60000 [7:43:57<140:19:23, 9.91s/it][INFO|trainer.py:2896] 2023-11-19 18:43:40,779 >> Saving model checkpoint to ./checkpoint-9000 [INFO|configuration_utils.py:462] 2023-11-19 18:43:40,806 >> Configuration saved in ./checkpoint-9000/config.json [INFO|configuration_utils.py:568] 2023-11-19 18:43:40,819 >> Configuration saved in ./checkpoint-9000/generation_config.json [2023-11-19 18:44:31,177] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step9000 is about to be saved! [2023-11-19 18:44:31,206] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-9000/global_step9000/mp_rank_00_model_states.pt [2023-11-19 18:44:31,207] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-9000/global_step9000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-19 18:44:31,150 >> Model weights saved in ./checkpoint-9000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-19 18:44:31,155 >> Feature extractor saved in ./checkpoint-9000/preprocessor_config.json [2023-11-19 18:44:39,043] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-9000/global_step9000/mp_rank_00_model_states.pt. [2023-11-19 18:44:39,053] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-19 18:45:00,369] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-19 18:45:00,387] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-19 18:45:00,387] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step9000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-19 18:46:02,212 >> Feature extractor saved in ./preprocessor_config.json 15%|█████████████▉ | 9020/60000 [7:50:14<148:53:42, 10.51s/it] 15%|██████████████ | 9039/60000 [7:53:29<143:37:22, 10.15s/it] 15%|██████████████ | 9059/60000 [7:57:24<150:38:34, 10.65s/it] Reading metadata...: 2165it [00:00, 13203.44it/s] | 9063/60000 [7:58:06<148:20:25, 10.48s/it] 15%|██████████████ | 9079/60000 [8:00:46<139:41:59, 9.88s/it] 15%|██████████████ | 9100/60000 [8:04:13<140:10:03, 9.91s/it] 15%|██████████████▏ | 9120/60000 [8:07:29<140:26:05, 9.94s/it] 15%|██████████████▏ | 9139/60000 [8:10:36<138:42:35, 9.82s/it] 15%|██████████████▏ | 9160/60000 [8:14:05<135:45:22, 9.61s/it] 15%|██████████████▏ | 9179/60000 [8:17:11<139:31:44, 9.88s/it] 15%|██████████████▎ | 9199/60000 [8:20:26<138:00:53, 9.78s/it] 15%|██████████████▎ | 9219/60000 [8:23:42<141:21:29, 10.02s/it] 15%|██████████████▎ | 9240/60000 [8:27:12<136:53:31, 9.71s/it] 15%|██████████████▎ | 9260/60000 [8:30:27<139:31:26, 9.90s/it] 15%|██████████████▍ | 9280/60000 [8:33:47<138:07:12, 9.80s/it] 16%|██████████████▍ | 9300/60000 [8:37:18<155:43:49, 11.06s/it] 16%|██████████████▍ | 9320/60000 [8:40:33<137:56:38, 9.80s/it] 16%|██████████████▍ | 9340/60000 [8:43:54<135:14:12, 9.61s/it] 16%|██████████████▌ | 9359/60000 [8:46:57<135:55:16, 9.66s/it] 16%|██████████████▌ | 9380/60000 [8:50:24<138:56:53, 9.88s/it] 16%|██████████████▌ | 9399/60000 [8:53:30<136:26:44, 9.71s/it] 16%|██████████████▌ | 9420/60000 [8:57:02<136:17:10, 9.70s/it] 16%|██████████████▋ | 9439/60000 [9:00:08<137:23:45, 9.78s/it] 16%|██████████████▋ | 9460/60000 [9:03:34<135:37:24, 9.66s/it] 16%|██████████████▋ | 9479/60000 [9:06:42<138:02:13, 9.84s/it] 16%|██████████████▋ | 9500/60000 [9:10:32<141:53:41, 10.12s/it] 16%|██████████████▊ | 9520/60000 [9:13:48<136:25:05, 9.73s/it] 16%|██████████████▊ | 9539/60000 [9:17:20<139:20:31, 9.94s/it] 16%|██████████████▊ | 9560/60000 [9:20:43<135:20:50, 9.66s/it] 16%|██████████████▊ | 9579/60000 [9:24:10<152:37:33, 10.90s/it] 16%|██████████████▉ | 9600/60000 [9:28:05<157:02:48, 11.22s/it] 16%|██████████████▉ | 9620/60000 [9:31:57<159:55:36, 11.43s/it] 16%|██████████████▉ | 9639/60000 [9:35:42<159:54:30, 11.43s/it] 16%|██████████████▉ | 9660/60000 [9:39:39<157:39:24, 11.27s/it] 16%|███████████████ | 9680/60000 [9:43:30<161:29:53, 11.55s/it] 16%|███████████████ | 9700/60000 [9:47:15<155:47:13, 11.15s/it] 16%|███████████████ | 9720/60000 [9:51:08<160:21:19, 11.48s/it] 16%|███████████████ | 9740/60000 [9:54:54<150:39:54, 10.79s/it] 16%|███████████████▏ | 9760/60000 [9:58:46<159:52:31, 11.46s/it] 16%|██████████████▉ | 9780/60000 [10:02:31<154:33:53, 11.08s/it] 16%|███████████████ | 9799/60000 [10:05:55<147:58:40, 10.61s/it] 16%|███████████████ | 9819/60000 [10:09:21<144:26:43, 10.36s/it] 16%|███████████████ | 9840/60000 [10:12:51<138:48:21, 9.96s/it] 16%|███████████████ | 9860/60000 [10:16:20<144:17:09, 10.36s/it] 16%|███████████████▏ | 9870/60000 [10:18:03<141:11:11, 10.14s/it] 16%|███████████████▏ | 9880/60000 [10:19:43<140:36:51, 10.10s/it] 16%|███████████████▏ | 9900/60000 [10:23:07<144:03:17, 10.35s/it] 17%|███████████████▏ | 9920/60000 [10:26:37<145:24:32, 10.45s/it] 17%|███████████████▏ | 9932/60000 [10:28:36<126:49:14, 9.12s/it] 17%|███████████████▏ | 9940/60000 [10:30:03<149:14:38, 10.73s/it] 17%|███████████████▎ | 9960/60000 [10:33:45<146:49:27, 10.56s/it] 17%|███████████████▎ | 9979/60000 [10:37:06<138:30:05, 9.97s/it] 17%|███████████████▎ | 9999/60000 [10:40:30<145:31:57, 10.48s/it] 17%|███████████████▏ | 10000/60000 [10:40:41<148:35:05, 10.70s/it][INFO|trainer.py:3173] 2023-11-19 21:39:56,447 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-19 21:39:56,448 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-19 21:39:56,448 >> Batch size = 4 Reading metadata...: 1704it [00:00, 7814.29it/s] [INFO|trainer_utils.py:759] 2023-11-19 21:39:57,946 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 17%|███████████████▏ | 10000/60000 [10:52:32<148:35:05, 10.70s/it] 17%|███████████████▏ | 10000/60000 [10:52:32<148:35:05, 10.70s/it][INFO|trainer.py:2896] 2023-11-19 21:52:11,864 >> Saving model checkpoint to ./checkpoint-10000 [INFO|configuration_utils.py:462] 2023-11-19 21:52:11,875 >> Configuration saved in ./checkpoint-10000/config.json [INFO|configuration_utils.py:568] 2023-11-19 21:52:11,881 >> Configuration saved in ./checkpoint-10000/generation_config.json [2023-11-19 21:52:53,458] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is about to be saved! [2023-11-19 21:52:53,512] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-10000/global_step10000/mp_rank_00_model_states.pt [2023-11-19 21:52:53,512] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-10000/global_step10000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-19 21:52:53,418 >> Model weights saved in ./checkpoint-10000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-19 21:52:53,425 >> Feature extractor saved in ./checkpoint-10000/preprocessor_config.json [2023-11-19 21:52:59,540] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-10000/global_step10000/mp_rank_00_model_states.pt. [2023-11-19 21:52:59,549] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-19 21:53:22,734] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-19 21:53:22,753] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-19 21:53:22,753] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step10000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-19 21:54:32,178 >> Feature extractor saved in ./preprocessor_config.json 17%|███████████████▏ | 10019/60000 [10:58:44<165:01:51, 11.89s/it] 17%|███████████████▏ | 10039/60000 [11:02:23<146:52:26, 10.58s/it] 17%|███████████████▎ | 10059/60000 [11:05:55<158:39:37, 11.44s/it] 17%|███████████████▎ | 10079/60000 [11:09:28<143:24:57, 10.34s/it] 17%|███████████████▎ | 10099/60000 [11:13:02<149:34:55, 10.79s/it] 17%|███████████████▎ | 10119/60000 [11:16:34<141:55:26, 10.24s/it] 17%|███████████████▍ | 10139/60000 [11:20:02<143:23:32, 10.35s/it] 17%|███████████████▍ | 10159/60000 [11:23:19<135:25:24, 9.78s/it] 17%|███████████████▍ | 10179/60000 [11:26:37<137:22:09, 9.93s/it] 17%|███████████████▍ | 10199/60000 [11:30:00<139:42:09, 10.10s/it] 17%|███████████████▌ | 10220/60000 [11:33:37<140:06:51, 10.13s/it] 17%|███████████████▌ | 10239/60000 [11:36:48<141:16:10, 10.22s/it] 17%|███████████████▌ | 10259/60000 [11:40:17<150:23:19, 10.88s/it] 17%|███████████████▌ | 10279/60000 [11:44:01<140:41:57, 10.19s/it] 17%|███████████████▌ | 10299/60000 [11:47:44<156:58:26, 11.37s/it] 17%|███████████████▋ | 10320/60000 [11:51:16<137:54:36, 9.99s/it] 17%|███████████████▋ | 10340/60000 [11:54:44<142:10:13, 10.31s/it] 17%|███████████████▋ | 10360/60000 [11:58:09<140:19:16, 10.18s/it] Reading metadata...: 2165it [00:00, 12632.83it/s] | 10362/60000 [11:58:28<137:03:33, 9.94s/it] 17%|███████████████▋ | 10379/60000 [12:01:30<161:41:26, 11.73s/it] 17%|███████████████▊ | 10399/60000 [12:04:55<143:53:03, 10.44s/it] 17%|███████████████▊ | 10419/60000 [12:08:23<139:18:43, 10.12s/it] 17%|███████████████▊ | 10439/60000 [12:11:55<137:49:21, 10.01s/it] 17%|███████████████▊ | 10460/60000 [12:15:22<142:18:11, 10.34s/it] 17%|███████████████▉ | 10479/60000 [12:18:50<153:00:39, 11.12s/it] 18%|███████████████▉ | 10500/60000 [12:22:19<133:02:38, 9.68s/it] 18%|███████████████▉ | 10520/60000 [12:25:27<129:45:58, 9.44s/it] 18%|███████████████▉ | 10539/60000 [12:28:28<130:20:20, 9.49s/it] 18%|████████████████ | 10559/60000 [12:31:41<134:37:09, 9.80s/it] 18%|████████████████ | 10579/60000 [12:34:49<129:32:14, 9.44s/it] 18%|████████████████ | 10599/60000 [12:37:57<127:45:17, 9.31s/it] 18%|████████████████ | 10619/60000 [12:41:05<129:00:56, 9.41s/it] 18%|████████████████▏ | 10639/60000 [12:44:25<130:16:10, 9.50s/it] 18%|████████████████▏ | 10660/60000 [12:47:48<129:30:24, 9.45s/it] 18%|████████████████▏ | 10680/60000 [12:51:38<149:32:50, 10.92s/it] 18%|████████████████▏ | 10699/60000 [12:54:40<133:14:28, 9.73s/it] 18%|████████████████▎ | 10719/60000 [12:57:50<129:21:54, 9.45s/it] 18%|████████████████▎ | 10740/60000 [13:01:16<134:45:45, 9.85s/it] 18%|████████████████▎ | 10759/60000 [13:04:17<128:00:11, 9.36s/it] 18%|████████████████▎ | 10780/60000 [13:07:36<130:59:59, 9.58s/it] 18%|████████████████▍ | 10800/60000 [13:10:47<132:53:34, 9.72s/it] 18%|████████████████▍ | 10820/60000 [13:14:08<148:40:56, 10.88s/it] 18%|████████████████▍ | 10840/60000 [13:17:20<134:16:19, 9.83s/it] Reading metadata...: 1650it [00:00, 9934.22it/s] | 10860/60000 [13:20:54<219:54:31, 16.11s/it] Reading metadata...: 1it [00:00, 6.44it/s] 18%|████████████████▍ | 10879/60000 [13:23:54<127:33:04, 9.35s/it] 18%|████████████████▌ | 10900/60000 [13:27:13<128:05:24, 9.39s/it] 18%|████████████████▌ | 10920/60000 [13:30:28<135:53:01, 9.97s/it] 18%|████████████████▌ | 10933/60000 [13:32:32<118:51:42, 8.72s/it] [2023-11-20 00:31:46,756] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 18%|████████████████▌ | 10940/60000 [13:33:36<129:13:52, 9.48s/it] 18%|████████████████▌ | 10960/60000 [13:36:46<129:47:38, 9.53s/it] 18%|████████████████▋ | 10980/60000 [13:40:07<129:57:36, 9.54s/it] 18%|████████████████▋ | 11000/60000 [13:43:48<167:40:52, 12.32s/it][INFO|trainer.py:3173] 2023-11-20 00:43:02,849 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 00:43:02,849 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 00:43:02,849 >> Batch size = 4 Reading metadata...: 1704it [00:00, 3745.53it/s] Reading metadata...: 1it [00:00, 2.49it/s] [INFO|trainer_utils.py:759] 2023-11-20 00:43:04,299 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 18%|████████████████▋ | 11000/60000 [13:54:19<167:40:52, 12.32s/it] 18%|████████████████▋ | 11000/60000 [13:54:19<167:40:52, 12.32s/it][INFO|trainer.py:2896] 2023-11-20 00:53:58,870 >> Saving model checkpoint to ./checkpoint-11000 [INFO|configuration_utils.py:462] 2023-11-20 00:53:58,878 >> Configuration saved in ./checkpoint-11000/config.json [INFO|configuration_utils.py:568] 2023-11-20 00:53:58,884 >> Configuration saved in ./checkpoint-11000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-20 00:54:41,102 >> Model weights saved in ./checkpoint-11000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 00:54:41,107 >> Feature extractor saved in ./checkpoint-11000/preprocessor_config.json [2023-11-20 00:54:41,131] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step11000 is about to be saved! [2023-11-20 00:54:41,160] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-11000/global_step11000/mp_rank_00_model_states.pt [2023-11-20 00:54:41,160] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-11000/global_step11000/mp_rank_00_model_states.pt... [2023-11-20 00:54:46,341] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-11000/global_step11000/mp_rank_00_model_states.pt. [2023-11-20 00:54:46,347] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-11000/global_step11000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 00:55:06,485] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-11000/global_step11000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 00:55:06,505] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-11000/global_step11000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 00:55:06,508] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step11000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 00:56:06,627 >> Feature extractor saved in ./preprocessor_config.json 18%|████████████████▋ | 11020/60000 [14:00:08<137:05:55, 10.08s/it] 18%|████████████████▋ | 11040/60000 [14:03:20<128:31:45, 9.45s/it] 18%|████████████████▊ | 11060/60000 [14:06:32<132:09:27, 9.72s/it] 18%|████████████████▊ | 11079/60000 [14:09:35<134:00:31, 9.86s/it] 18%|████████████████▊ | 11100/60000 [14:13:01<129:43:09, 9.55s/it] 19%|████████████████▊ | 11120/60000 [14:16:14<130:05:18, 9.58s/it] 19%|████████████████▉ | 11140/60000 [14:19:25<129:32:02, 9.54s/it] 19%|████████████████▉ | 11159/60000 [14:22:26<131:04:38, 9.66s/it] 19%|████████████████▉ | 11180/60000 [14:25:49<128:24:36, 9.47s/it] 19%|████████████████▉ | 11199/60000 [14:28:49<128:02:40, 9.45s/it] 19%|█████████████████ | 11219/60000 [14:32:00<131:09:43, 9.68s/it] 19%|█████████████████ | 11239/60000 [14:35:09<127:29:49, 9.41s/it] 19%|█████████████████ | 11259/60000 [14:38:33<128:37:15, 9.50s/it] 19%|█████████████████ | 11279/60000 [14:41:50<130:27:16, 9.64s/it] 19%|█████████████████▏ | 11299/60000 [14:45:12<135:59:30, 10.05s/it] 19%|█████████████████▏ | 11319/60000 [14:48:22<128:31:00, 9.50s/it] 19%|█████████████████▏ | 11339/60000 [14:51:43<135:27:58, 10.02s/it] 19%|█████████████████▏ | 11359/60000 [14:55:03<130:18:29, 9.64s/it] 19%|█████████████████▎ | 11379/60000 [14:58:17<132:11:37, 9.79s/it] 19%|█████████████████▎ | 11399/60000 [15:01:48<130:04:18, 9.63s/it] 19%|█████████████████▎ | 11419/60000 [15:05:00<127:58:51, 9.48s/it] 19%|█████████████████▎ | 11439/60000 [15:08:16<138:51:45, 10.29s/it] 19%|█████████████████▍ | 11459/60000 [15:11:27<127:14:02, 9.44s/it] 19%|█████████████████▍ | 11479/60000 [15:14:37<126:30:58, 9.39s/it] 19%|█████████████████▍ | 11499/60000 [15:17:47<126:46:41, 9.41s/it] 19%|█████████████████▍ | 11520/60000 [15:21:08<129:04:22, 9.58s/it] 19%|█████████████████▌ | 11539/60000 [15:24:14<126:42:01, 9.41s/it] 19%|█████████████████▌ | 11560/60000 [15:27:33<125:45:44, 9.35s/it] 19%|█████████████████▌ | 11580/60000 [15:30:43<127:56:48, 9.51s/it] 19%|█████████████████▌ | 11600/60000 [15:33:53<127:34:58, 9.49s/it] 19%|█████████████████▌ | 11620/60000 [15:37:09<129:53:36, 9.67s/it] 19%|█████████████████▋ | 11640/60000 [15:40:29<129:41:52, 9.65s/it] 19%|█████████████████▋ | 11645/60000 [15:41:18<130:06:42, 9.69s/it] 19%|█████████████████▋ | 11659/60000 [15:43:30<131:28:22, 9.79s/it] Reading metadata...: 2165it [00:00, 5376.61it/s] | 11662/60000 [15:43:59<131:34:40, 9.80s/it] 19%|█████████████████▋ | 11679/60000 [15:46:42<129:03:41, 9.62s/it] 20%|█████████████████▋ | 11700/60000 [15:50:02<127:00:18, 9.47s/it] 20%|█████████████████▊ | 11720/60000 [15:53:18<128:35:46, 9.59s/it] 20%|█████████████████▊ | 11740/60000 [15:56:54<134:37:26, 10.04s/it] 20%|█████████████████▊ | 11760/60000 [16:00:06<129:41:47, 9.68s/it] 20%|█████████████████▊ | 11779/60000 [16:03:32<129:28:45, 9.67s/it] 20%|█████████████████▉ | 11800/60000 [16:06:57<126:20:45, 9.44s/it] 20%|█████████████████▉ | 11820/60000 [16:10:07<127:23:04, 9.52s/it] 20%|█████████████████▉ | 11840/60000 [16:13:17<126:22:03, 9.45s/it] Reading metadata...: 1650it [00:00, 10218.92it/s] | 11850/60000 [16:14:53<127:03:32, 9.50s/it] 20%|█████████████████▉ | 11860/60000 [16:16:28<128:16:17, 9.59s/it] 20%|██████████████████ | 11879/60000 [16:19:35<143:52:59, 10.76s/it] 20%|██████████████████ | 11899/60000 [16:22:48<132:18:23, 9.90s/it] 20%|██████████████████ | 11919/60000 [16:25:56<125:54:05, 9.43s/it] 20%|██████████████████ | 11940/60000 [16:29:17<126:30:23, 9.48s/it] 20%|██████████████████▏ | 11960/60000 [16:32:35<164:57:31, 12.36s/it] 20%|██████████████████▏ | 11980/60000 [16:35:49<125:55:22, 9.44s/it] 20%|██████████████████▏ | 12000/60000 [16:39:10<135:47:56, 10.18s/it][INFO|trainer.py:3173] 2023-11-20 03:38:24,759 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 03:38:24,759 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 03:38:24,759 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8610.28it/s] Reading metadata...: 1it [00:00, 5.99it/s] [INFO|trainer_utils.py:759] 2023-11-20 03:38:26,118 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 20%|██████████████████▏ | 12000/60000 [16:49:34<135:47:56, 10.18s/it] 20%|██████████████████▏ | 12000/60000 [16:49:34<135:47:56, 10.18s/it][INFO|trainer.py:2896] 2023-11-20 03:49:18,392 >> Saving model checkpoint to ./checkpoint-12000 [INFO|configuration_utils.py:462] 2023-11-20 03:49:18,406 >> Configuration saved in ./checkpoint-12000/config.json [INFO|configuration_utils.py:568] 2023-11-20 03:49:18,411 >> Configuration saved in ./checkpoint-12000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-20 03:49:57,741 >> Model weights saved in ./checkpoint-12000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 03:49:57,747 >> Feature extractor saved in ./checkpoint-12000/preprocessor_config.json [2023-11-20 03:49:57,767] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step12000 is about to be saved! [2023-11-20 03:49:57,790] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-12000/global_step12000/mp_rank_00_model_states.pt [2023-11-20 03:49:57,791] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-12000/global_step12000/mp_rank_00_model_states.pt... [2023-11-20 03:50:04,981] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-12000/global_step12000/mp_rank_00_model_states.pt. [2023-11-20 03:50:04,997] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-12000/global_step12000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 03:50:31,874] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-12000/global_step12000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 03:50:31,883] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-12000/global_step12000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 03:50:31,883] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step12000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 03:51:33,054 >> Feature extractor saved in ./preprocessor_config.json 20%|██████████████████▏ | 12020/60000 [16:55:35<132:59:33, 9.98s/it] 20%|██████████████████▎ | 12039/60000 [16:58:40<130:00:35, 9.76s/it] 20%|██████████████████▎ | 12060/60000 [17:02:12<137:18:08, 10.31s/it] 20%|██████████████████▎ | 12079/60000 [17:05:15<129:09:24, 9.70s/it] 20%|██████████████████▎ | 12100/60000 [17:08:36<128:14:59, 9.64s/it] 20%|██████████████████▍ | 12120/60000 [17:11:48<130:46:22, 9.83s/it] 20%|██████████████████▍ | 12140/60000 [17:15:00<127:46:04, 9.61s/it] 20%|██████████████████▍ | 12160/60000 [17:18:17<126:17:21, 9.50s/it] 20%|██████████████████▍ | 12180/60000 [17:21:29<129:33:28, 9.75s/it] 20%|██████████████████▌ | 12200/60000 [17:24:39<125:57:17, 9.49s/it] 20%|██████████████████▌ | 12220/60000 [17:27:50<126:26:47, 9.53s/it] 20%|██████████████████▌ | 12240/60000 [17:31:08<131:30:31, 9.91s/it] 20%|██████████████████▌ | 12259/60000 [17:34:09<126:12:19, 9.52s/it] 20%|██████████████████▌ | 12280/60000 [17:37:30<127:57:47, 9.65s/it] 20%|██████████████████▋ | 12300/60000 [17:40:49<128:34:35, 9.70s/it] 21%|██████████████████▋ | 12320/60000 [17:44:05<144:35:00, 10.92s/it] 21%|██████████████████▋ | 12340/60000 [17:47:35<134:49:48, 10.18s/it] 21%|██████████████████▋ | 12359/60000 [17:50:34<124:13:26, 9.39s/it] 21%|██████████████████▊ | 12379/60000 [17:53:46<125:38:40, 9.50s/it] 21%|██████████████████▊ | 12400/60000 [17:57:06<125:03:40, 9.46s/it] 21%|██████████████████▊ | 12419/60000 [18:00:14<126:20:41, 9.56s/it] 21%|██████████████████▊ | 12439/60000 [18:03:24<124:50:05, 9.45s/it] 21%|██████████████████▉ | 12459/60000 [18:07:11<176:09:46, 13.34s/it] 21%|██████████████████▉ | 12479/60000 [18:10:47<127:03:50, 9.63s/it] 21%|██████████████████▉ | 12499/60000 [18:14:05<131:48:32, 9.99s/it] 21%|██████████████████▉ | 12519/60000 [18:17:18<126:21:51, 9.58s/it] 21%|███████████████████ | 12539/60000 [18:20:28<124:21:34, 9.43s/it] 21%|███████████████████ | 12560/60000 [18:23:50<127:29:43, 9.68s/it] 21%|███████████████████ | 12580/60000 [18:27:02<126:40:18, 9.62s/it] 21%|███████████████████ | 12599/60000 [18:30:09<124:20:27, 9.44s/it] 21%|███████████████████▏ | 12619/60000 [18:33:23<125:20:29, 9.52s/it] 21%|███████████████████▏ | 12639/60000 [18:36:44<124:38:25, 9.47s/it] 21%|███████████████████▏ | 12660/60000 [18:40:05<124:10:38, 9.44s/it] 21%|███████████████████▏ | 12679/60000 [18:43:37<145:33:59, 11.07s/it] 21%|███████████████████▎ | 12699/60000 [18:46:48<125:09:06, 9.53s/it] 21%|███████████████████▎ | 12719/60000 [18:49:58<124:50:12, 9.51s/it] 21%|███████████████████▎ | 12739/60000 [18:53:07<123:34:28, 9.41s/it] 21%|███████████████████▎ | 12759/60000 [18:56:31<126:42:15, 9.66s/it] 21%|███████████████████▍ | 12779/60000 [18:59:48<124:56:43, 9.53s/it] 21%|███████████████████▍ | 12800/60000 [19:03:09<124:51:07, 9.52s/it] 21%|███████████████████▍ | 12819/60000 [19:06:11<124:09:41, 9.47s/it] Reading metadata...: 1650it [00:00, 10123.95it/s] | 12839/60000 [19:09:21<123:07:07, 9.40s/it] Reading metadata...: 1it [00:00, 6.48it/s] 21%|███████████████████▌ | 12859/60000 [19:12:39<126:01:44, 9.62s/it] 21%|███████████████████▌ | 12879/60000 [19:15:49<123:11:36, 9.41s/it] 21%|███████████████████▌ | 12899/60000 [19:19:00<123:55:39, 9.47s/it] 22%|███████████████████▌ | 12919/60000 [19:22:11<123:44:09, 9.46s/it] 22%|███████████████████▌ | 12939/60000 [19:25:28<134:14:22, 10.27s/it] 22%|███████████████████▋ | 12959/60000 [19:28:56<166:09:50, 12.72s/it] Reading metadata...: 2165it [00:00, 13002.55it/s] | 12960/60000 [19:29:06<155:46:51, 11.92s/it] 22%|███████████████████▋ | 12979/60000 [19:32:19<128:14:56, 9.82s/it] 22%|███████████████████▋ | 12999/60000 [19:35:38<128:30:35, 9.84s/it] 22%|███████████████████▋ | 13000/60000 [19:35:48<127:15:33, 9.75s/it][INFO|trainer.py:3173] 2023-11-20 06:35:02,737 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 06:35:02,737 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 06:35:02,738 >> Batch size = 4 Reading metadata...: 1704it [00:00, 4016.64it/s] [INFO|trainer_utils.py:759] 2023-11-20 06:35:04,147 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.1693115234375, 'eval_wer': 8.725970599321522, 'eval_runtime': 631.033, 'eval_samples_per_second': 2.7, 'eval_steps_per_second': 0.675, 'epoch': 0.22} 22%|███████████████████▋ | 13000/60000 [19:46:19<127:15:33, 9.75s/it][INFO|trainer.py:2896] 2023-11-20 06:45:58,636 >> Saving model checkpoint to ./checkpoint-13000 [INFO|configuration_utils.py:462] 2023-11-20 06:45:58,650 >> Configuration saved in ./checkpoint-13000/config.json [INFO|configuration_utils.py:568] 2023-11-20 06:45:58,665 >> Configuration saved in ./checkpoint-13000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-20 06:46:38,481 >> Model weights saved in ./checkpoint-13000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 06:46:38,490 >> Feature extractor saved in ./checkpoint-13000/preprocessor_config.json [2023-11-20 06:46:38,507] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step13000 is about to be saved! [2023-11-20 06:46:38,532] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-13000/global_step13000/mp_rank_00_model_states.pt [2023-11-20 06:46:38,532] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-13000/global_step13000/mp_rank_00_model_states.pt... [2023-11-20 06:46:47,055] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-13000/global_step13000/mp_rank_00_model_states.pt. [2023-11-20 06:46:47,061] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-13000/global_step13000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 06:47:09,337] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-13000/global_step13000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 06:47:09,361] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-13000/global_step13000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 06:47:09,364] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step13000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 06:48:10,954 >> Feature extractor saved in ./preprocessor_config.json 22%|███████████████████▋ | 13019/60000 [19:52:08<134:57:33, 10.34s/it] 22%|███████████████████▊ | 13040/60000 [19:55:46<130:40:58, 10.02s/it] 22%|███████████████████▊ | 13059/60000 [19:58:58<131:05:35, 10.05s/it] 22%|███████████████████▊ | 13080/60000 [20:02:25<129:14:20, 9.92s/it] 22%|███████████████████▊ | 13099/60000 [20:05:33<131:13:16, 10.07s/it] 22%|███████████████████▉ | 13120/60000 [20:09:06<129:34:05, 9.95s/it] 22%|███████████████████▉ | 13140/60000 [20:12:24<127:07:15, 9.77s/it] 22%|███████████████████▉ | 13159/60000 [20:15:29<125:44:16, 9.66s/it] 22%|███████████████████▉ | 13179/60000 [20:18:46<126:30:11, 9.73s/it] 22%|████████████████████ | 13199/60000 [20:22:00<124:18:57, 9.56s/it] 22%|████████████████████ | 13219/60000 [20:25:42<125:09:38, 9.63s/it] 22%|████████████████████ | 13240/60000 [20:29:07<126:38:56, 9.75s/it] 22%|████████████████████ | 13259/60000 [20:32:14<130:03:10, 10.02s/it] 22%|████████████████████▏ | 13280/60000 [20:35:39<125:46:13, 9.69s/it] 22%|████████████████████▏ | 13300/60000 [20:39:29<128:37:00, 9.91s/it] 22%|████████████████████▏ | 13319/60000 [20:42:33<123:50:06, 9.55s/it] 22%|████████████████████▏ | 13340/60000 [20:45:57<127:58:57, 9.87s/it] 22%|████████████████████▎ | 13360/60000 [20:49:11<125:04:39, 9.65s/it] 22%|████████████████████▎ | 13379/60000 [20:52:20<140:04:59, 10.82s/it] 22%|████████████████████▎ | 13400/60000 [20:55:42<124:43:41, 9.64s/it] 22%|████████████████████▎ | 13419/60000 [20:58:46<125:22:23, 9.69s/it] 22%|████████████████████▍ | 13440/60000 [21:02:13<128:18:42, 9.92s/it] 22%|████████████████████▍ | 13460/60000 [21:05:27<126:14:41, 9.77s/it] 22%|████████████████████▍ | 13480/60000 [21:08:44<123:45:21, 9.58s/it] 22%|████████████████████▍ | 13500/60000 [21:12:09<147:21:07, 11.41s/it] 23%|████████████████████▌ | 13520/60000 [21:15:22<124:54:36, 9.67s/it] 23%|████████████████████▌ | 13540/60000 [21:18:34<124:34:04, 9.65s/it] 23%|████████████████████▌ | 13560/60000 [21:21:55<133:28:49, 10.35s/it] 23%|████████████████████▌ | 13579/60000 [21:24:58<122:28:31, 9.50s/it] 23%|████████████████████▋ | 13600/60000 [21:28:47<126:47:15, 9.84s/it] 23%|████████████████████▋ | 13620/60000 [21:32:01<126:30:05, 9.82s/it] 23%|████████████████████▋ | 13640/60000 [21:35:24<127:36:14, 9.91s/it] 23%|████████████████████▋ | 13647/60000 [21:36:33<115:01:26, 8.93s/it] 23%|████████████████████▋ | 13648/60000 [21:36:39<104:28:23, 8.11s/it] 23%|████████████████████▋ | 13660/60000 [21:38:35<123:18:25, 9.58s/it] 23%|████████████████████▋ | 13679/60000 [21:41:38<125:17:51, 9.74s/it] 23%|████████████████████▊ | 13699/60000 [21:44:52<124:53:37, 9.71s/it] 23%|████████████████████▊ | 13719/60000 [21:48:04<123:11:25, 9.58s/it] 23%|████████████████████▊ | 13739/60000 [21:51:23<127:01:16, 9.88s/it] 23%|████████████████████▊ | 13759/60000 [21:54:37<122:14:40, 9.52s/it] 23%|████████████████████▉ | 13779/60000 [21:57:52<125:08:01, 9.75s/it] 23%|████████████████████▉ | 13799/60000 [22:01:07<123:16:04, 9.61s/it] 23%|████████████████████▉ | 13819/60000 [22:04:26<138:20:38, 10.78s/it] Reading metadata...: 1650it [00:00, 9278.34it/s] | 13829/60000 [22:06:02<122:51:56, 9.58s/it] 23%|████████████████████▉ | 13839/60000 [22:07:40<123:50:24, 9.66s/it] 23%|█████████████████████ | 13859/60000 [22:10:53<122:54:09, 9.59s/it] 23%|█████████████████████ | 13880/60000 [22:14:17<121:02:34, 9.45s/it] 23%|█████████████████████ | 13899/60000 [22:17:24<123:37:22, 9.65s/it] 23%|█████████████████████ | 13919/60000 [22:20:46<124:50:47, 9.75s/it] 23%|█████████████████████▏ | 13940/60000 [22:24:34<156:09:49, 12.21s/it] 23%|█████████████████████▏ | 13959/60000 [22:27:49<151:22:34, 11.84s/it] 23%|█████████████████████▏ | 13979/60000 [22:31:06<125:24:30, 9.81s/it] 23%|█████████████████████▏ | 13999/60000 [22:34:30<130:03:06, 10.18s/it] 23%|█████████████████████▏ | 14000/60000 [22:34:40<129:19:52, 10.12s/it][INFO|trainer.py:3173] 2023-11-20 09:33:54,675 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 09:33:54,675 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 09:33:54,675 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8398.78it/s] [INFO|trainer_utils.py:759] 2023-11-20 09:33:55,638 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 23%|█████████████████████▏ | 14000/60000 [22:45:17<129:19:52, 10.12s/it] 23%|█████████████████████▏ | 14000/60000 [22:45:17<129:19:52, 10.12s/it][INFO|trainer.py:2896] 2023-11-20 09:44:58,898 >> Saving model checkpoint to ./checkpoint-14000 [INFO|configuration_utils.py:462] 2023-11-20 09:44:58,912 >> Configuration saved in ./checkpoint-14000/config.json [INFO|configuration_utils.py:568] 2023-11-20 09:44:58,919 >> Configuration saved in ./checkpoint-14000/generation_config.json [2023-11-20 09:45:41,108] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step14000 is about to be saved! [2023-11-20 09:45:41,135] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-14000/global_step14000/mp_rank_00_model_states.pt [2023-11-20 09:45:41,135] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-14000/global_step14000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-20 09:45:41,087 >> Model weights saved in ./checkpoint-14000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 09:45:41,093 >> Feature extractor saved in ./checkpoint-14000/preprocessor_config.json [2023-11-20 09:45:51,131] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-14000/global_step14000/mp_rank_00_model_states.pt. [2023-11-20 09:45:51,141] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-14000/global_step14000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 09:46:15,181] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-14000/global_step14000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 09:46:15,205] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-14000/global_step14000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 09:46:15,206] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step14000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 09:47:17,289 >> Feature extractor saved in ./preprocessor_config.json 23%|█████████████████████▎ | 14019/60000 [22:51:15<133:10:07, 10.43s/it] 23%|█████████████████████▎ | 14040/60000 [22:54:46<127:56:23, 10.02s/it] 23%|█████████████████████▎ | 14059/60000 [22:58:00<129:41:05, 10.16s/it] 23%|█████████████████████▎ | 14079/60000 [23:01:23<128:05:10, 10.04s/it] 24%|█████████████████████▍ | 14100/60000 [23:04:59<128:43:17, 10.10s/it] 24%|█████████████████████▍ | 14120/60000 [23:08:17<126:34:47, 9.93s/it] 24%|█████████████████████▍ | 14140/60000 [23:11:34<125:29:23, 9.85s/it] 24%|█████████████████████▍ | 14159/60000 [23:14:42<125:56:53, 9.89s/it] 24%|█████████████████████▌ | 14180/60000 [23:18:14<125:44:12, 9.88s/it] 24%|█████████████████████▌ | 14199/60000 [23:21:21<124:58:04, 9.82s/it] 24%|█████████████████████▌ | 14220/60000 [23:24:48<126:43:30, 9.97s/it] 24%|█████████████████████▌ | 14240/60000 [23:29:11<308:21:31, 24.26s/it] Reading metadata...: 2165it [00:00, 12900.23it/s] | 14259/60000 [23:32:35<131:20:18, 10.34s/it] 24%|█████████████████████▋ | 14260/60000 [23:32:50<149:30:14, 11.77s/it] 24%|█████████████████████▋ | 14280/60000 [23:36:08<127:23:36, 10.03s/it] 24%|█████████████████████▋ | 14300/60000 [23:39:34<126:05:09, 9.93s/it] 24%|█████████████████████▋ | 14320/60000 [23:42:52<126:08:11, 9.94s/it] 24%|█████████████████████▋ | 14339/60000 [23:45:57<122:52:54, 9.69s/it] 24%|█████████████████████▊ | 14360/60000 [23:49:32<124:47:32, 9.84s/it] 24%|█████████████████████▊ | 14379/60000 [23:52:39<123:56:35, 9.78s/it] 24%|█████████████████████▊ | 14400/60000 [23:56:06<124:00:08, 9.79s/it] 24%|█████████████████████▊ | 14420/60000 [23:59:25<125:27:16, 9.91s/it] 24%|█████████████████████▉ | 14440/60000 [24:02:49<130:15:51, 10.29s/it] 24%|█████████████████████▉ | 14460/60000 [24:06:06<122:52:18, 9.71s/it] 24%|█████████████████████▉ | 14480/60000 [24:09:21<124:31:00, 9.85s/it] 24%|█████████████████████▉ | 14499/60000 [24:12:54<125:38:19, 9.94s/it] 24%|██████████████████████ | 14520/60000 [24:16:22<126:11:35, 9.99s/it] 24%|██████████████████████ | 14540/60000 [24:19:46<124:04:40, 9.83s/it] 24%|██████████████████████ | 14560/60000 [24:23:02<121:38:15, 9.64s/it] 24%|██████████████████████ | 14580/60000 [24:26:28<133:24:41, 10.57s/it] 24%|██████████████████████▏ | 14600/60000 [24:29:55<128:41:16, 10.20s/it] 24%|██████████████████████▏ | 14620/60000 [24:33:22<126:14:04, 10.01s/it] 24%|██████████████████████▏ | 14640/60000 [24:36:54<127:16:39, 10.10s/it] 24%|██████████████████████▏ | 14649/60000 [24:38:19<110:35:44, 8.78s/it] 24%|██████████████████████▏ | 14650/60000 [24:38:26<101:09:20, 8.03s/it] 24%|██████████████████████▏ | 14660/60000 [24:40:07<129:30:35, 10.28s/it] 24%|██████████████████████▎ | 14680/60000 [24:43:46<133:10:35, 10.58s/it] 24%|██████████████████████▎ | 14700/60000 [24:47:06<125:31:06, 9.97s/it] 25%|██████████████████████▎ | 14720/60000 [24:50:30<124:25:19, 9.89s/it] 25%|██████████████████████▎ | 14739/60000 [24:53:39<122:50:27, 9.77s/it] 25%|██████████████████████▍ | 14759/60000 [24:56:58<125:58:05, 10.02s/it] 25%|██████████████████████▍ | 14780/60000 [25:00:29<126:12:22, 10.05s/it] 25%|██████████████████████▍ | 14800/60000 [25:03:55<125:59:26, 10.03s/it] Reading metadata...: 1650it [00:00, 2886.73it/s] | 14819/60000 [25:07:04<126:54:06, 10.11s/it] Reading metadata...: 1it [00:00, 1.78it/s] 25%|██████████████████████▌ | 14840/60000 [25:10:33<125:44:52, 10.02s/it] 25%|██████████████████████▌ | 14860/60000 [25:13:51<124:44:02, 9.95s/it] 25%|██████████████████████▌ | 14880/60000 [25:17:14<127:27:56, 10.17s/it] 25%|██████████████████████▌ | 14900/60000 [25:20:30<121:16:45, 9.68s/it] 25%|██████████████████████▋ | 14919/60000 [25:23:36<122:29:33, 9.78s/it] 25%|██████████████████████▋ | 14940/60000 [25:27:05<125:47:10, 10.05s/it] 25%|██████████████████████▋ | 14960/60000 [25:30:33<134:12:31, 10.73s/it] 25%|██████████████████████▋ | 14980/60000 [25:33:58<125:29:09, 10.03s/it] 25%|██████████████████████▊ | 15000/60000 [25:37:16<123:37:53, 9.89s/it][INFO|trainer.py:3173] 2023-11-20 12:36:31,337 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 12:36:31,338 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 12:36:31,338 >> Batch size = 4 {'loss': 0.0457, 'learning_rate': 2.28935593220339e-06, 'epoch': 0.25} Reading metadata...: 1704it [00:00, 9939.61it/s] [INFO|trainer_utils.py:759] 2023-11-20 12:36:32,322 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.1861572265625, 'eval_wer': 8.886166603844705, 'eval_runtime': 645.1584, 'eval_samples_per_second': 2.641, 'eval_steps_per_second': 0.66, 'epoch': 0.25} 25%|██████████████████████▊ | 15000/60000 [25:48:01<123:37:53, 9.89s/it][INFO|trainer.py:2896] 2023-11-20 12:47:43,800 >> Saving model checkpoint to ./checkpoint-15000 [INFO|configuration_utils.py:462] 2023-11-20 12:47:43,810 >> Configuration saved in ./checkpoint-15000/config.json [INFO|configuration_utils.py:568] 2023-11-20 12:47:43,820 >> Configuration saved in ./checkpoint-15000/generation_config.json [2023-11-20 12:48:48,507] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step15000 is about to be saved! [2023-11-20 12:48:48,535] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-15000/global_step15000/mp_rank_00_model_states.pt [2023-11-20 12:48:48,536] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-15000/global_step15000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-20 12:48:48,477 >> Model weights saved in ./checkpoint-15000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 12:48:48,484 >> Feature extractor saved in ./checkpoint-15000/preprocessor_config.json [2023-11-20 12:48:57,690] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-15000/global_step15000/mp_rank_00_model_states.pt. [2023-11-20 12:48:57,696] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-15000/global_step15000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 12:49:31,425] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-15000/global_step15000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 12:49:31,437] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-15000/global_step15000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 12:49:31,439] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step15000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 12:50:34,480 >> Feature extractor saved in ./preprocessor_config.json 25%|██████████████████████▊ | 15019/60000 [25:54:54<135:21:04, 10.83s/it] 25%|██████████████████████▊ | 15039/60000 [25:58:21<130:36:24, 10.46s/it] 25%|██████████████████████▊ | 15060/60000 [26:02:04<136:17:55, 10.92s/it] 25%|██████████████████████▊ | 15079/60000 [26:05:17<125:41:51, 10.07s/it] 25%|██████████████████████▉ | 15099/60000 [26:08:38<125:30:24, 10.06s/it] 25%|██████████████████████▉ | 15119/60000 [26:12:02<127:07:57, 10.20s/it] 25%|██████████████████████▉ | 15139/60000 [26:15:26<126:49:28, 10.18s/it] 25%|██████████████████████▉ | 15159/60000 [26:18:48<122:07:48, 9.81s/it] 25%|███████████████████████ | 15179/60000 [26:22:09<124:31:18, 10.00s/it] 25%|███████████████████████ | 15199/60000 [26:25:48<124:48:13, 10.03s/it] 25%|███████████████████████ | 15219/60000 [26:29:08<122:08:02, 9.82s/it] 25%|███████████████████████ | 15239/60000 [26:32:34<127:43:07, 10.27s/it] 25%|███████████████████████▏ | 15249/60000 [26:34:13<123:08:35, 9.91s/it] 25%|███████████████████████▏ | 15259/60000 [26:35:50<121:58:03, 9.81s/it] 25%|███████████████████████▏ | 15279/60000 [26:39:09<124:55:39, 10.06s/it] 25%|███████████████████████▏ | 15299/60000 [26:42:39<125:00:43, 10.07s/it] 26%|███████████████████████▏ | 15319/60000 [26:46:04<137:10:02, 11.05s/it] 26%|███████████████████████▎ | 15339/60000 [26:49:23<125:08:02, 10.09s/it] 26%|███████████████████████▎ | 15359/60000 [26:52:41<122:17:35, 9.86s/it] 26%|███████████████████████▎ | 15379/60000 [26:55:59<123:57:40, 10.00s/it] 26%|███████████████████████▎ | 15399/60000 [26:59:39<153:18:26, 12.37s/it] 26%|███████████████████████▍ | 15420/60000 [27:03:44<127:02:07, 10.26s/it] 26%|███████████████████████▍ | 15439/60000 [27:06:56<124:13:52, 10.04s/it] 26%|███████████████████████▍ | 15459/60000 [27:10:15<122:25:26, 9.89s/it] 26%|███████████████████████▍ | 15479/60000 [27:13:36<122:32:48, 9.91s/it] 26%|███████████████████████▌ | 15499/60000 [27:17:02<128:35:09, 10.40s/it] 26%|███████████████████████▌ | 15519/60000 [27:20:21<123:59:25, 10.03s/it] 26%|███████████████████████▌ | 15539/60000 [27:23:39<122:29:31, 9.92s/it] 26%|███████████████████████▌ | 15558/60000 [27:26:48<121:03:39, 9.81s/it] 26%|███████████████████████▌ | 15559/60000 [27:27:00<129:50:35, 10.52s/it] 26%|███████████████████████▋ | 15580/60000 [27:30:27<121:34:25, 9.85s/it] 26%|███████████████████████▋ | 15599/60000 [27:33:41<125:20:44, 10.16s/it] 26%|███████████████████████▋ | 15620/60000 [27:37:07<120:46:23, 9.80s/it] 26%|███████████████████████▋ | 15639/60000 [27:40:25<122:24:28, 9.93s/it] 26%|███████████████████████▋ | 15659/60000 [27:43:43<120:40:01, 9.80s/it] 26%|███████████████████████▊ | 15679/60000 [27:47:05<123:43:56, 10.05s/it] 26%|███████████████████████▊ | 15700/60000 [27:50:35<125:53:05, 10.23s/it] 26%|███████████████████████▊ | 15720/60000 [27:53:54<121:59:52, 9.92s/it] 26%|███████████████████████▊ | 15739/60000 [27:57:04<121:42:08, 9.90s/it] 26%|███████████████████████▉ | 15760/60000 [28:00:49<137:33:43, 11.19s/it] 26%|███████████████████████▉ | 15780/60000 [28:04:08<120:16:35, 9.79s/it] 26%|███████████████████████▉ | 15800/60000 [28:07:25<122:10:14, 9.95s/it] Reading metadata...: 1650it [00:01, 1460.36it/s] | 15810/60000 [28:09:04<121:14:12, 9.88s/it] 26%|███████████████████████▉ | 15820/60000 [28:10:48<122:37:21, 9.99s/it] 26%|████████████████████████ | 15839/60000 [28:13:57<121:06:45, 9.87s/it] 26%|████████████████████████ | 15859/60000 [28:17:20<121:02:53, 9.87s/it] 26%|████████████████████████ | 15880/60000 [28:20:47<119:09:12, 9.72s/it] 26%|████████████████████████ | 15899/60000 [28:23:54<121:52:51, 9.95s/it] 27%|████████████████████████▏ | 15919/60000 [28:27:12<121:34:40, 9.93s/it] 27%|████████████████████████▏ | 15940/60000 [28:30:48<126:55:25, 10.37s/it] 27%|████████████████████████▏ | 15959/60000 [28:34:07<131:27:57, 10.75s/it] 27%|████████████████████████▏ | 15980/60000 [28:37:36<124:38:25, 10.19s/it] 27%|████████████████████████▎ | 15999/60000 [28:40:45<122:51:46, 10.05s/it] 27%|████████████████████████▎ | 16000/60000 [28:40:55<121:30:36, 9.94s/it][INFO|trainer.py:3173] 2023-11-20 15:40:10,132 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 15:40:10,132 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 15:40:10,132 >> Batch size = 4 Reading metadata...: 1704it [00:00, 9310.25it/s] [INFO|trainer_utils.py:759] 2023-11-20 15:40:11,101 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 27%|████████████████████████▎ | 16000/60000 [28:51:51<121:30:36, 9.94s/it] 27%|████████████████████████▎ | 16000/60000 [28:51:51<121:30:36, 9.94s/it][INFO|trainer.py:2896] 2023-11-20 15:51:37,453 >> Saving model checkpoint to ./checkpoint-16000 [INFO|configuration_utils.py:462] 2023-11-20 15:51:37,471 >> Configuration saved in ./checkpoint-16000/config.json [INFO|configuration_utils.py:568] 2023-11-20 15:51:37,477 >> Configuration saved in ./checkpoint-16000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-20 15:52:25,487 >> Model weights saved in ./checkpoint-16000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 15:52:25,493 >> Feature extractor saved in ./checkpoint-16000/preprocessor_config.json [2023-11-20 15:52:25,524] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step16000 is about to be saved! [2023-11-20 15:52:25,555] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-16000/global_step16000/mp_rank_00_model_states.pt [2023-11-20 15:52:25,555] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-16000/global_step16000/mp_rank_00_model_states.pt... [2023-11-20 15:52:39,617] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-16000/global_step16000/mp_rank_00_model_states.pt. [2023-11-20 15:52:39,632] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-16000/global_step16000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 15:53:42,826] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-16000/global_step16000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 15:53:42,836] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-16000/global_step16000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 15:53:42,837] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step16000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 15:54:46,846 >> Feature extractor saved in ./preprocessor_config.json 27%|████████████████████████▎ | 16020/60000 [28:59:05<126:47:10, 10.38s/it] 27%|████████████████████████▎ | 16040/60000 [29:03:18<122:57:48, 10.07s/it] 27%|████████████████████████▎ | 16060/60000 [29:06:43<126:07:15, 10.33s/it] 27%|████████████████████████▍ | 16080/60000 [29:10:04<120:15:14, 9.86s/it] 27%|████████████████████████▍ | 16100/60000 [29:13:24<122:01:30, 10.01s/it] 27%|████████████████████████▍ | 16120/60000 [29:16:50<123:15:58, 10.11s/it] 27%|████████████████████████▍ | 16140/60000 [29:20:33<154:05:18, 12.65s/it] 27%|████████████████████████▌ | 16159/60000 [29:24:08<122:48:59, 10.09s/it] 27%|████████████████████████▌ | 16180/60000 [29:27:36<122:13:32, 10.04s/it] 27%|████████████████████████▌ | 16200/60000 [29:31:09<143:48:50, 11.82s/it] 27%|████████████████████████▌ | 16219/60000 [29:34:28<127:38:53, 10.50s/it] 27%|████████████████████████▋ | 16239/60000 [29:37:53<123:30:41, 10.16s/it] 27%|████████████████████████▋ | 16259/60000 [29:41:15<123:45:20, 10.19s/it] 27%|████████████████████████▋ | 16279/60000 [29:45:02<132:20:33, 10.90s/it] 27%|████████████████████████▋ | 16299/60000 [29:48:46<125:29:30, 10.34s/it] 27%|████████████████████████▊ | 16319/60000 [29:52:31<128:38:16, 10.60s/it] 27%|████████████████████████▊ | 16339/60000 [29:55:54<122:43:55, 10.12s/it] 27%|████████████████████████▊ | 16342/60000 [29:56:25<124:03:14, 10.23s/it] 27%|████████████████████████▊ | 16359/60000 [29:59:15<122:21:04, 10.09s/it] 27%|████████████████████████▊ | 16368/60000 [30:00:43<108:23:10, 8.94s/it] 27%|████████████████████████▊ | 16380/60000 [30:02:51<125:11:45, 10.33s/it] 27%|████████████████████████▊ | 16399/60000 [30:06:02<121:19:57, 10.02s/it] 27%|████████████████████████▉ | 16419/60000 [30:09:25<123:04:44, 10.17s/it] 27%|████████████████████████▉ | 16439/60000 [30:12:45<119:06:50, 9.84s/it] 27%|████████████████████████▉ | 16459/60000 [30:16:07<121:10:38, 10.02s/it] 27%|████████████████████████▉ | 16479/60000 [30:19:34<123:35:09, 10.22s/it] 27%|█████████████████████████ | 16499/60000 [30:22:55<119:45:21, 9.91s/it] 28%|█████████████████████████ | 16519/60000 [30:26:37<121:59:39, 10.10s/it] 28%|█████████████████████████ | 16539/60000 [30:29:57<119:07:17, 9.87s/it] 28%|█████████████████████████ | 16559/60000 [30:33:24<123:19:17, 10.22s/it] 28%|█████████████████████████▏ | 16579/60000 [30:36:45<122:51:48, 10.19s/it] 28%|█████████████████████████▏ | 16599/60000 [30:40:05<120:01:23, 9.96s/it] 28%|█████████████████████████▏ | 16619/60000 [30:43:25<120:45:41, 10.02s/it] 28%|█████████████████████████▏ | 16640/60000 [30:47:04<123:43:58, 10.27s/it] 28%|█████████████████████████▎ | 16659/60000 [30:50:21<120:27:20, 10.01s/it] 28%|█████████████████████████▎ | 16679/60000 [30:53:42<119:07:12, 9.90s/it] 28%|█████████████████████████▎ | 16700/60000 [30:57:11<119:40:19, 9.95s/it] 28%|█████████████████████████▎ | 16719/60000 [31:00:21<117:32:45, 9.78s/it] 28%|█████████████████████████▍ | 16739/60000 [31:03:46<119:11:33, 9.92s/it] 28%|█████████████████████████▍ | 16759/60000 [31:07:07<120:05:49, 10.00s/it] 28%|█████████████████████████▍ | 16780/60000 [31:11:20<130:32:17, 10.87s/it] Reading metadata...: 1650it [00:00, 9681.06it/s] | 16799/60000 [31:14:40<124:14:11, 10.35s/it] 28%|█████████████████████████▍ | 16800/60000 [31:14:52<131:19:10, 10.94s/it] 28%|█████████████████████████▌ | 16820/60000 [31:18:26<131:51:50, 10.99s/it] 28%|█████████████████████████▌ | 16839/60000 [31:21:44<121:46:12, 10.16s/it] 28%|█████████████████████████▌ | 16857/60000 [31:24:44<121:44:56, 10.16s/it] 28%|█████████████████████████▌ | 16859/60000 [31:25:08<129:12:41, 10.78s/it] 28%|█████████████████████████▌ | 16880/60000 [31:29:04<148:33:27, 12.40s/it] 28%|█████████████████████████▋ | 16899/60000 [31:32:18<120:37:45, 10.08s/it] 28%|█████████████████████████▋ | 16919/60000 [31:35:51<128:47:47, 10.76s/it] 28%|█████████████████████████▋ | 16939/60000 [31:39:27<127:11:31, 10.63s/it] 28%|█████████████████████████▋ | 16959/60000 [31:43:12<138:13:14, 11.56s/it] 28%|█████████████████████████▊ | 16980/60000 [31:46:55<127:12:29, 10.65s/it] 28%|█████████████████████████▊ | 16999/60000 [31:50:26<133:32:15, 11.18s/it] 28%|█████████████████████████▊ | 17000/60000 [31:50:37<133:22:18, 11.17s/it][INFO|trainer.py:3173] 2023-11-20 18:49:52,335 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 18:49:52,336 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 18:49:52,336 >> Batch size = 4 Reading metadata...: 1704it [00:00, 2782.72it/s] [INFO|trainer_utils.py:759] 2023-11-20 18:49:55,080 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.17822265625, 'eval_wer': 8.46211835657746, 'eval_runtime': 701.0277, 'eval_samples_per_second': 2.431, 'eval_steps_per_second': 0.608, 'epoch': 0.28} 28%|█████████████████████████▊ | 17000/60000 [32:02:18<133:22:18, 11.17s/it][INFO|trainer.py:2896] 2023-11-20 19:02:01,287 >> Saving model checkpoint to ./checkpoint-17000 [INFO|configuration_utils.py:462] 2023-11-20 19:02:01,296 >> Configuration saved in ./checkpoint-17000/config.json [INFO|configuration_utils.py:568] 2023-11-20 19:02:01,301 >> Configuration saved in ./checkpoint-17000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-20 19:02:42,296 >> Model weights saved in ./checkpoint-17000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 19:02:42,302 >> Feature extractor saved in ./checkpoint-17000/preprocessor_config.json [2023-11-20 19:02:42,327] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step17000 is about to be saved! [2023-11-20 19:02:42,356] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-17000/global_step17000/mp_rank_00_model_states.pt [2023-11-20 19:02:42,356] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-17000/global_step17000/mp_rank_00_model_states.pt... [2023-11-20 19:02:48,218] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-17000/global_step17000/mp_rank_00_model_states.pt. [2023-11-20 19:02:48,224] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-17000/global_step17000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 19:03:09,226] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-17000/global_step17000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 19:03:09,242] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-17000/global_step17000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 19:03:09,242] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step17000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 19:04:16,047 >> Feature extractor saved in ./preprocessor_config.json 28%|█████████████████████████▊ | 17019/60000 [32:08:20<130:17:55, 10.91s/it] 28%|█████████████████████████▊ | 17040/60000 [32:12:10<131:16:58, 11.00s/it] 28%|█████████████████████████▊ | 17059/60000 [32:15:35<126:48:28, 10.63s/it] 28%|█████████████████████████▉ | 17079/60000 [32:19:07<129:01:51, 10.82s/it] 28%|█████████████████████████▉ | 17100/60000 [32:23:09<119:08:31, 10.00s/it] 29%|█████████████████████████▉ | 17120/60000 [32:26:39<126:38:31, 10.63s/it] 29%|█████████████████████████▉ | 17139/60000 [32:30:02<126:25:22, 10.62s/it] 29%|██████████████████████████ | 17160/60000 [32:33:43<121:42:38, 10.23s/it] 29%|██████████████████████████ | 17179/60000 [32:37:14<129:43:00, 10.91s/it] 29%|██████████████████████████ | 17200/60000 [32:40:58<126:19:25, 10.63s/it] 29%|██████████████████████████ | 17220/60000 [32:44:48<121:41:20, 10.24s/it] 29%|██████████████████████████▏ | 17240/60000 [32:48:10<121:17:48, 10.21s/it] 29%|██████████████████████████▏ | 17260/60000 [32:51:46<133:43:41, 11.26s/it] 29%|██████████████████████████▏ | 17280/60000 [32:55:19<124:13:09, 10.47s/it] 29%|██████████████████████████▏ | 17300/60000 [32:58:50<115:22:34, 9.73s/it] 29%|██████████████████████████▎ | 17319/60000 [33:01:58<119:29:26, 10.08s/it] 29%|██████████████████████████▎ | 17340/60000 [33:05:27<116:57:49, 9.87s/it] 29%|██████████████████████████▎ | 17360/60000 [33:08:51<117:19:02, 9.90s/it] 29%|██████████████████████████▎ | 17379/60000 [33:11:59<116:31:25, 9.84s/it] 29%|██████████████████████████▍ | 17399/60000 [33:15:15<115:06:32, 9.73s/it] 29%|██████████████████████████▍ | 17419/60000 [33:18:36<116:15:55, 9.83s/it] 29%|██████████████████████████▍ | 17439/60000 [33:22:01<122:21:14, 10.35s/it] 29%|██████████████████████████▍ | 17459/60000 [33:25:20<119:52:56, 10.14s/it] 29%|██████████████████████████▌ | 17479/60000 [33:28:38<117:50:22, 9.98s/it] 29%|██████████████████████████▌ | 17499/60000 [33:31:59<117:25:52, 9.95s/it] 29%|██████████████████████████▌ | 17519/60000 [33:35:21<118:30:37, 10.04s/it] 29%|██████████████████████████▌ | 17540/60000 [33:38:54<118:32:10, 10.05s/it] 29%|██████████████████████████▋ | 17559/60000 [33:42:04<117:47:53, 9.99s/it] 29%|██████████████████████████▋ | 17579/60000 [33:45:19<114:13:38, 9.69s/it] 29%|██████████████████████████▋ | 17599/60000 [33:48:37<114:49:50, 9.75s/it] 29%|██████████████████████████▋ | 17620/60000 [33:53:09<136:30:35, 11.60s/it] 29%|██████████████████████████▊ | 17639/60000 [33:56:29<115:37:18, 9.83s/it] 29%|██████████████████████████▊ | 17659/60000 [33:59:47<117:19:13, 9.98s/it] 29%|██████████████████████████▊ | 17680/60000 [34:03:21<115:27:02, 9.82s/it] 30%|██████████████████████████▊ | 17700/60000 [34:06:49<135:01:39, 11.49s/it] 30%|██████████████████████████▉ | 17720/60000 [34:10:08<114:22:05, 9.74s/it] 30%|██████████████████████████▉ | 17739/60000 [34:13:17<117:01:45, 9.97s/it] 30%|██████████████████████████▉ | 17759/60000 [34:16:37<118:57:18, 10.14s/it] 30%|██████████████████████████▉ | 17779/60000 [34:19:56<115:35:07, 9.86s/it] Reading metadata...: 1650it [00:00, 9802.57it/s] | 17790/60000 [34:21:51<122:47:12, 10.47s/it] 30%|██████████████████████████▉ | 17799/60000 [34:23:21<114:52:46, 9.80s/it] 30%|███████████████████████████ | 17820/60000 [34:27:33<130:34:55, 11.14s/it] 30%|███████████████████████████ | 17840/60000 [34:31:07<123:18:54, 10.53s/it] 30%|███████████████████████████ | 17859/60000 [34:34:29<124:59:35, 10.68s/it] 30%|███████████████████████████ | 17879/60000 [34:37:57<122:33:15, 10.47s/it] 30%|███████████████████████████▏ | 17900/60000 [34:41:40<125:00:50, 10.69s/it] 30%|███████████████████████████▏ | 17920/60000 [34:45:28<132:19:19, 11.32s/it] 30%|███████████████████████████▏ | 17940/60000 [34:49:09<129:39:29, 11.10s/it] 30%|███████████████████████████▏ | 17960/60000 [34:53:03<130:33:00, 11.18s/it] 30%|███████████████████████████▎ | 17979/60000 [34:56:34<125:20:14, 10.74s/it] 30%|███████████████████████████▎ | 17999/60000 [35:00:17<133:47:48, 11.47s/it] 30%|███████████████████████████▎ | 18000/60000 [35:00:28<133:47:03, 11.47s/it][INFO|trainer.py:3173] 2023-11-20 21:59:43,380 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-20 21:59:43,387 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-20 21:59:43,387 >> Batch size = 4 [INFO|trainer_utils.py:759] 2023-11-20 21:59:47,673 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.1827392578125, 'eval_wer': 8.678854127402941, 'eval_runtime': 734.6885, 'eval_samples_per_second': 2.319, 'eval_steps_per_second': 0.58, 'epoch': 0.3} 30%|███████████████████████████▎ | 18000/60000 [35:12:43<133:47:03, 11.47s/it][INFO|trainer.py:2896] 2023-11-20 22:12:31,234 >> Saving model checkpoint to ./checkpoint-18000 [INFO|configuration_utils.py:462] 2023-11-20 22:12:31,268 >> Configuration saved in ./checkpoint-18000/config.json [INFO|configuration_utils.py:568] 2023-11-20 22:12:31,286 >> Configuration saved in ./checkpoint-18000/generation_config.json [2023-11-20 22:13:42,614] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step18000 is about to be saved! [2023-11-20 22:13:42,781] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-18000/global_step18000/mp_rank_00_model_states.pt [2023-11-20 22:13:42,781] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-18000/global_step18000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-20 22:13:42,402 >> Model weights saved in ./checkpoint-18000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-20 22:13:42,438 >> Feature extractor saved in ./checkpoint-18000/preprocessor_config.json [2023-11-20 22:13:58,098] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-18000/global_step18000/mp_rank_00_model_states.pt. [2023-11-20 22:13:58,157] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-18000/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-20 22:15:54,672] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-18000/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-20 22:15:54,682] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-18000/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-20 22:15:54,683] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step18000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-20 22:17:07,828 >> Feature extractor saved in ./preprocessor_config.json 30%|███████████████████████████▎ | 18019/60000 [35:22:13<147:24:35, 12.64s/it] 30%|███████████████████████████▎ | 18039/60000 [35:26:16<145:51:45, 12.51s/it] 30%|███████████████████████████▍ | 18059/60000 [35:30:39<138:35:50, 11.90s/it] 30%|███████████████████████████▍ | 18080/60000 [35:34:40<138:02:45, 11.86s/it] 30%|███████████████████████████▍ | 18100/60000 [35:38:22<131:04:12, 11.26s/it] 30%|███████████████████████████▍ | 18120/60000 [35:42:35<175:02:36, 15.05s/it] 30%|███████████████████████████▌ | 18140/60000 [35:46:21<149:42:24, 12.87s/it] Reading metadata...: 2165it [00:00, 12945.95it/s] | 18155/60000 [35:49:00<119:18:22, 10.26s/it] 30%|███████████████████████████▌ | 18159/60000 [35:49:44<123:22:18, 10.61s/it] 30%|███████████████████████████▌ | 18179/60000 [35:53:23<134:03:21, 11.54s/it] 30%|███████████████████████████▌ | 18200/60000 [35:57:17<127:55:12, 11.02s/it] 30%|███████████████████████████▋ | 18220/60000 [36:01:04<133:06:06, 11.47s/it] 30%|███████████████████████████▋ | 18239/60000 [36:04:52<144:58:13, 12.50s/it] 30%|███████████████████████████▋ | 18260/60000 [36:08:48<126:55:06, 10.95s/it] 30%|███████████████████████████▋ | 18280/60000 [36:12:34<127:34:36, 11.01s/it] 30%|███████████████████████████▊ | 18300/60000 [36:16:40<134:17:22, 11.59s/it] 31%|███████████████████████████▊ | 18319/60000 [36:20:12<137:56:38, 11.91s/it] 31%|███████████████████████████▊ | 18340/60000 [36:24:08<132:23:31, 11.44s/it] 31%|███████████████████████████▊ | 18360/60000 [36:28:14<138:47:15, 12.00s/it] 31%|███████████████████████████▊ | 18369/60000 [36:29:52<120:34:08, 10.43s/it] 31%|███████████████████████████▊ | 18370/60000 [36:29:59<108:12:54, 9.36s/it] 31%|███████████████████████████▉ | 18380/60000 [36:31:47<126:05:14, 10.91s/it] 31%|███████████████████████████▉ | 18400/60000 [36:35:23<123:26:11, 10.68s/it] 31%|███████████████████████████▉ | 18420/60000 [36:39:13<123:42:02, 10.71s/it] 31%|███████████████████████████▉ | 18440/60000 [36:43:08<144:32:51, 12.52s/it] 31%|███████████████████████████▉ | 18460/60000 [36:46:49<125:39:24, 10.89s/it] 31%|████████████████████████████ | 18480/60000 [36:50:19<118:54:04, 10.31s/it] 31%|████████████████████████████ | 18499/60000 [36:53:43<122:38:08, 10.64s/it] 31%|████████████████████████████ | 18519/60000 [36:57:08<118:05:34, 10.25s/it] 31%|████████████████████████████ | 18539/60000 [37:00:37<116:38:47, 10.13s/it] 31%|████████████████████████████▏ | 18559/60000 [37:03:55<110:08:46, 9.57s/it] 31%|████████████████████████████▏ | 18579/60000 [37:07:20<111:57:28, 9.73s/it] 31%|████████████████████████████▏ | 18599/60000 [37:10:39<122:47:59, 10.68s/it] 31%|████████████████████████████▏ | 18619/60000 [37:14:05<118:02:03, 10.27s/it] 31%|████████████████████████████▎ | 18639/60000 [37:17:47<126:07:31, 10.98s/it] 31%|████████████████████████████▎ | 18659/60000 [37:21:06<110:10:44, 9.59s/it] 31%|████████████████████████████▎ | 18679/60000 [37:24:31<108:28:07, 9.45s/it] 31%|████████████████████████████▎ | 18699/60000 [37:27:43<108:56:44, 9.50s/it] 31%|████████████████████████████▍ | 18719/60000 [37:30:59<110:03:06, 9.60s/it] 31%|████████████████████████████▍ | 18739/60000 [37:34:19<115:54:48, 10.11s/it] 31%|████████████████████████████▍ | 18759/60000 [37:37:56<165:45:33, 14.47s/it] 31%|████████████████████████████▍ | 18779/60000 [37:41:10<112:19:00, 9.81s/it] Reading metadata...: 1650it [00:00, 4113.64it/s] | 18780/60000 [37:41:20<112:18:00, 9.81s/it] 31%|████████████████████████████▌ | 18799/60000 [37:44:21<107:05:38, 9.36s/it] 31%|████████████████████████████▌ | 18819/60000 [37:47:34<111:51:52, 9.78s/it] 31%|████████████████████████████▌ | 18839/60000 [37:50:44<108:03:53, 9.45s/it] 31%|████████████████████████████▌ | 18860/60000 [37:54:07<106:58:49, 9.36s/it] 31%|████████████████████████████▋ | 18880/60000 [37:57:14<106:47:22, 9.35s/it] 31%|████████████████████████████▋ | 18899/60000 [38:00:10<105:54:15, 9.28s/it] 32%|████████████████████████████▋ | 18920/60000 [38:03:26<105:27:14, 9.24s/it] 32%|████████████████████████████▋ | 18939/60000 [38:06:27<110:03:11, 9.65s/it] 32%|████████████████████████████▊ | 18959/60000 [38:09:43<110:52:39, 9.73s/it] 32%|████████████████████████████▊ | 18980/60000 [38:13:00<109:19:27, 9.59s/it] 32%|████████████████████████████▊ | 19000/60000 [38:16:35<111:00:17, 9.75s/it][INFO|trainer.py:3173] 2023-11-21 01:15:49,608 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 01:15:49,609 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 01:15:49,609 >> Batch size = 4 {'loss': 0.0456, 'learning_rate': 2.0862203389830507e-06, 'epoch': 0.32} Reading metadata...: 1704it [00:00, 6722.89it/s] [INFO|trainer_utils.py:759] 2023-11-21 01:15:50,677 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.17822265625, 'eval_wer': 8.518658122879758, 'eval_runtime': 591.7997, 'eval_samples_per_second': 2.879, 'eval_steps_per_second': 0.72, 'epoch': 0.32} 32%|████████████████████████████▊ | 19000/60000 [38:26:26<111:00:17, 9.75s/it][INFO|trainer.py:2896] 2023-11-21 01:26:10,667 >> Saving model checkpoint to ./checkpoint-19000 [INFO|configuration_utils.py:462] 2023-11-21 01:26:10,674 >> Configuration saved in ./checkpoint-19000/config.json [INFO|configuration_utils.py:568] 2023-11-21 01:26:10,679 >> Configuration saved in ./checkpoint-19000/generation_config.json [2023-11-21 01:26:51,330] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step19000 is about to be saved! [2023-11-21 01:26:51,354] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-19000/global_step19000/mp_rank_00_model_states.pt [2023-11-21 01:26:51,354] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-19000/global_step19000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-21 01:26:51,298 >> Model weights saved in ./checkpoint-19000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 01:26:51,303 >> Feature extractor saved in ./checkpoint-19000/preprocessor_config.json [2023-11-21 01:26:56,815] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-19000/global_step19000/mp_rank_00_model_states.pt. [2023-11-21 01:26:56,821] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-19000/global_step19000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 01:27:16,145] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-19000/global_step19000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 01:27:16,155] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-19000/global_step19000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 01:27:16,156] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step19000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 01:28:14,214 >> Feature extractor saved in ./preprocessor_config.json 32%|████████████████████████████▊ | 19019/60000 [38:32:04<113:16:16, 9.95s/it] 32%|████████████████████████████▉ | 19040/60000 [38:35:58<107:24:44, 9.44s/it] 32%|████████████████████████████▉ | 19059/60000 [38:38:59<106:57:07, 9.40s/it] 32%|████████████████████████████▉ | 19079/60000 [38:42:10<107:48:53, 9.48s/it] 32%|████████████████████████████▉ | 19099/60000 [38:45:48<107:55:19, 9.50s/it] 32%|████████████████████████████▉ | 19119/60000 [38:49:00<107:04:14, 9.43s/it] 32%|█████████████████████████████▎ | 19133/60000 [38:51:09<96:13:34, 8.48s/it] 32%|█████████████████████████████ | 19140/60000 [38:52:15<105:34:26, 9.30s/it] 32%|█████████████████████████████ | 19160/60000 [38:55:22<107:01:04, 9.43s/it] 32%|█████████████████████████████ | 19179/60000 [38:58:21<105:55:53, 9.34s/it] 32%|█████████████████████████████ | 19200/60000 [39:01:44<117:40:39, 10.38s/it] 32%|█████████████████████████████▏ | 19219/60000 [39:04:44<107:07:13, 9.46s/it] 32%|█████████████████████████████▏ | 19240/60000 [39:08:00<105:38:17, 9.33s/it] 32%|█████████████████████████████▏ | 19259/60000 [39:11:14<139:16:18, 12.31s/it] 32%|█████████████████████████████▏ | 19279/60000 [39:14:25<108:01:05, 9.55s/it] 32%|█████████████████████████████▎ | 19300/60000 [39:18:02<109:49:16, 9.71s/it] 32%|█████████████████████████████▎ | 19320/60000 [39:21:11<109:05:29, 9.65s/it] 32%|█████████████████████████████▎ | 19339/60000 [39:24:11<107:31:50, 9.52s/it] 32%|█████████████████████████████▎ | 19360/60000 [39:27:28<106:27:32, 9.43s/it] 32%|█████████████████████████████▍ | 19380/60000 [39:30:42<110:31:15, 9.80s/it] 32%|█████████████████████████████▍ | 19399/60000 [39:33:44<108:46:20, 9.64s/it] 32%|█████████████████████████████▍ | 19420/60000 [39:37:05<108:00:18, 9.58s/it] 32%|█████████████████████████████▍ | 19440/60000 [39:40:18<116:33:09, 10.34s/it] 32%|█████████████████████████████▌ | 19453/60000 [39:42:24<100:52:54, 8.96s/it] Reading metadata...: 2165it [00:00, 12527.03it/s] | 19455/60000 [39:42:44<105:26:55, 9.36s/it] 32%|█████████████████████████████▌ | 19460/60000 [39:43:34<110:43:09, 9.83s/it] 32%|█████████████████████████████▌ | 19480/60000 [39:46:50<108:27:51, 9.64s/it] 32%|█████████████████████████████▌ | 19499/60000 [39:49:50<105:19:55, 9.36s/it] 33%|█████████████████████████████▌ | 19519/60000 [39:53:11<109:22:02, 9.73s/it] 33%|█████████████████████████████▋ | 19540/60000 [39:56:32<107:12:26, 9.54s/it] 33%|█████████████████████████████▋ | 19560/60000 [39:59:45<104:43:33, 9.32s/it] 33%|█████████████████████████████▋ | 19580/60000 [40:02:51<103:39:51, 9.23s/it] 33%|█████████████████████████████▋ | 19600/60000 [40:06:34<112:45:59, 10.05s/it] 33%|█████████████████████████████▊ | 19620/60000 [40:09:50<126:51:25, 11.31s/it] 33%|█████████████████████████████▊ | 19640/60000 [40:12:57<103:51:43, 9.26s/it] 33%|█████████████████████████████▊ | 19660/60000 [40:16:08<106:15:40, 9.48s/it] 33%|█████████████████████████████▊ | 19680/60000 [40:19:15<104:46:21, 9.35s/it] 33%|█████████████████████████████▉ | 19700/60000 [40:22:21<104:21:25, 9.32s/it] 33%|█████████████████████████████▉ | 19720/60000 [40:25:28<106:03:53, 9.48s/it] 33%|█████████████████████████████▉ | 19740/60000 [40:28:38<102:14:09, 9.14s/it] 33%|█████████████████████████████▉ | 19759/60000 [40:31:37<105:55:25, 9.48s/it] Reading metadata...: 1650it [00:00, 9427.38it/s] | 19770/60000 [40:33:19<104:04:31, 9.31s/it] 33%|█████████████████████████████▉ | 19779/60000 [40:34:44<104:28:16, 9.35s/it] 33%|██████████████████████████████ | 19799/60000 [40:37:52<107:07:45, 9.59s/it] 33%|██████████████████████████████ | 19819/60000 [40:41:28<149:22:40, 13.38s/it] 33%|██████████████████████████████ | 19839/60000 [40:44:35<103:14:57, 9.26s/it] 33%|██████████████████████████████ | 19859/60000 [40:47:40<103:20:11, 9.27s/it] 33%|██████████████████████████████▏ | 19879/60000 [40:50:46<103:22:51, 9.28s/it] 33%|██████████████████████████████▏ | 19899/60000 [40:53:51<103:05:17, 9.25s/it] 33%|██████████████████████████████▏ | 19919/60000 [40:57:00<102:54:58, 9.24s/it] 33%|██████████████████████████████▏ | 19939/60000 [41:00:31<113:58:01, 10.24s/it] 33%|██████████████████████████████▎ | 19960/60000 [41:04:19<104:54:09, 9.43s/it] 33%|██████████████████████████████▎ | 19979/60000 [41:07:18<103:49:27, 9.34s/it] 33%|██████████████████████████████▎ | 19999/60000 [41:10:29<104:45:41, 9.43s/it] 33%|██████████████████████████████▎ | 20000/60000 [41:10:38<103:12:29, 9.29s/it][INFO|trainer.py:3173] 2023-11-21 04:09:53,205 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 04:09:53,206 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 04:09:53,206 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8993.55it/s] [INFO|trainer_utils.py:759] 2023-11-21 04:09:54,170 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 33%|██████████████████████████████▎ | 20000/60000 [41:20:43<103:12:29, 9.29s/it] 33%|██████████████████████████████▎ | 20000/60000 [41:20:43<103:12:29, 9.29s/it][INFO|trainer.py:2896] 2023-11-21 04:20:30,221 >> Saving model checkpoint to ./checkpoint-20000 [INFO|configuration_utils.py:462] 2023-11-21 04:20:30,236 >> Configuration saved in ./checkpoint-20000/config.json [INFO|configuration_utils.py:568] 2023-11-21 04:20:30,248 >> Configuration saved in ./checkpoint-20000/generation_config.json [2023-11-21 04:21:16,086] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step20000 is about to be saved! [2023-11-21 04:21:16,121] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-20000/global_step20000/mp_rank_00_model_states.pt [2023-11-21 04:21:16,121] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-20000/global_step20000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-21 04:21:16,042 >> Model weights saved in ./checkpoint-20000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 04:21:16,050 >> Feature extractor saved in ./checkpoint-20000/preprocessor_config.json [2023-11-21 04:21:24,359] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-20000/global_step20000/mp_rank_00_model_states.pt. [2023-11-21 04:21:24,370] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-20000/global_step20000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 04:21:47,972] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-20000/global_step20000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 04:21:47,997] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-20000/global_step20000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 04:21:47,998] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step20000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 04:22:45,221 >> Feature extractor saved in ./preprocessor_config.json 33%|██████████████████████████████▎ | 20019/60000 [41:26:37<113:49:18, 10.25s/it] 33%|██████████████████████████████▍ | 20040/60000 [41:29:57<105:38:11, 9.52s/it] 33%|██████████████████████████████▍ | 20059/60000 [41:32:57<104:19:24, 9.40s/it] 33%|██████████████████████████████▍ | 20079/60000 [41:36:10<104:43:13, 9.44s/it] 33%|██████████████████████████████▍ | 20099/60000 [41:39:21<102:28:57, 9.25s/it] 34%|██████████████████████████████▌ | 20120/60000 [41:42:39<103:10:01, 9.31s/it] 34%|██████████████████████████████▌ | 20139/60000 [41:45:36<103:42:49, 9.37s/it] 34%|██████████████████████████████▌ | 20160/60000 [41:48:52<104:50:00, 9.47s/it] 34%|██████████████████████████████▌ | 20180/60000 [41:52:03<103:59:30, 9.40s/it] 34%|██████████████████████████████▋ | 20200/60000 [41:55:09<103:43:33, 9.38s/it] 34%|██████████████████████████████▋ | 20220/60000 [41:58:15<103:08:21, 9.33s/it] 34%|██████████████████████████████▋ | 20239/60000 [42:01:14<104:16:36, 9.44s/it] 34%|██████████████████████████████▋ | 20260/60000 [42:04:34<110:43:14, 10.03s/it] 34%|██████████████████████████████▊ | 20280/60000 [42:07:56<103:40:30, 9.40s/it] 34%|██████████████████████████████▊ | 20300/60000 [42:11:16<101:47:45, 9.23s/it] 34%|██████████████████████████████▊ | 20320/60000 [42:14:22<103:26:00, 9.38s/it] 34%|██████████████████████████████▊ | 20340/60000 [42:17:27<101:40:12, 9.23s/it] 34%|██████████████████████████████▉ | 20360/60000 [42:20:39<103:11:34, 9.37s/it] 34%|██████████████████████████████▉ | 20380/60000 [42:23:45<100:46:47, 9.16s/it] 34%|██████████████████████████████▉ | 20400/60000 [42:26:52<102:08:45, 9.29s/it] 34%|██████████████████████████████▉ | 20420/60000 [42:29:58<102:05:41, 9.29s/it] 34%|███████████████████████████████ | 20440/60000 [42:33:11<103:48:22, 9.45s/it] 34%|███████████████████████████████ | 20460/60000 [42:36:17<102:03:14, 9.29s/it] 34%|███████████████████████████████ | 20480/60000 [42:39:27<101:25:33, 9.24s/it] 34%|███████████████████████████████ | 20500/60000 [42:42:33<101:05:33, 9.21s/it] 34%|███████████████████████████████ | 20520/60000 [42:45:39<102:21:48, 9.33s/it] 34%|███████████████████████████████▏ | 20540/60000 [42:48:50<101:24:36, 9.25s/it] 34%|███████████████████████████████▏ | 20560/60000 [42:52:20<108:37:50, 9.92s/it] 34%|███████████████████████████████▏ | 20579/60000 [42:55:16<101:06:12, 9.23s/it] 34%|███████████████████████████████▏ | 20599/60000 [42:58:23<102:52:01, 9.40s/it] 34%|███████████████████████████████▎ | 20619/60000 [43:01:45<126:00:02, 11.52s/it] 34%|███████████████████████████████▎ | 20639/60000 [43:04:51<100:52:25, 9.23s/it] 34%|███████████████████████████████▎ | 20659/60000 [43:07:57<101:39:57, 9.30s/it] 34%|███████████████████████████████▎ | 20679/60000 [43:11:03<101:38:08, 9.31s/it] 34%|███████████████████████████████▍ | 20699/60000 [43:14:13<117:10:49, 10.73s/it] 35%|███████████████████████████████▍ | 20719/60000 [43:17:18<100:30:41, 9.21s/it] 35%|███████████████████████████████▍ | 20739/60000 [43:20:25<101:21:26, 9.29s/it] Reading metadata...: 2165it [00:00, 13484.43it/s] | 20754/60000 [43:22:44<101:05:28, 9.27s/it] 35%|███████████████████████████████▍ | 20759/60000 [43:23:32<101:59:08, 9.36s/it] Reading metadata...: 1650it [00:00, 10321.66it/s] | 20760/60000 [43:23:41<101:22:53, 9.30s/it] 35%|███████████████████████████████▌ | 20779/60000 [43:26:38<101:18:29, 9.30s/it] 35%|███████████████████████████████▌ | 20800/60000 [43:29:59<102:07:43, 9.38s/it] 35%|███████████████████████████████▌ | 20819/60000 [43:32:55<100:25:02, 9.23s/it] 35%|███████████████████████████████▌ | 20839/60000 [43:36:00<100:47:45, 9.27s/it] 35%|███████████████████████████████▋ | 20859/60000 [43:39:38<100:04:39, 9.20s/it] 35%|███████████████████████████████▋ | 20880/60000 [43:42:58<102:52:32, 9.47s/it] 35%|███████████████████████████████▋ | 20899/60000 [43:46:12<107:46:48, 9.92s/it] 35%|███████████████████████████████▋ | 20919/60000 [43:49:18<100:27:49, 9.25s/it] 35%|███████████████████████████████▊ | 20940/60000 [43:52:35<101:49:30, 9.38s/it] 35%|███████████████████████████████▊ | 20959/60000 [43:55:41<102:31:18, 9.45s/it] 35%|███████████████████████████████▊ | 20980/60000 [43:59:01<101:29:01, 9.36s/it] 35%|███████████████████████████████▊ | 20999/60000 [44:01:58<101:28:52, 9.37s/it] 35%|███████████████████████████████▊ | 21000/60000 [44:02:08<101:54:19, 9.41s/it][INFO|trainer.py:3173] 2023-11-21 07:01:22,922 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 07:01:22,922 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 07:01:22,922 >> Batch size = 4 Reading metadata...: 1704it [00:00, 4141.28it/s] [INFO|trainer_utils.py:759] 2023-11-21 07:01:24,721 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 35%|███████████████████████████████▊ | 21000/60000 [44:12:10<101:54:19, 9.41s/it] 35%|███████████████████████████████▊ | 21000/60000 [44:12:10<101:54:19, 9.41s/it][INFO|trainer.py:2896] 2023-11-21 07:11:37,065 >> Saving model checkpoint to ./checkpoint-21000 [INFO|configuration_utils.py:462] 2023-11-21 07:11:37,071 >> Configuration saved in ./checkpoint-21000/config.json [INFO|configuration_utils.py:568] 2023-11-21 07:11:37,086 >> Configuration saved in ./checkpoint-21000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-21 07:11:59,193 >> Model weights saved in ./checkpoint-21000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 07:11:59,199 >> Feature extractor saved in ./checkpoint-21000/preprocessor_config.json [2023-11-21 07:11:59,218] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step21000 is about to be saved! [2023-11-21 07:11:59,300] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-21000/global_step21000/mp_rank_00_model_states.pt [2023-11-21 07:11:59,300] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-21000/global_step21000/mp_rank_00_model_states.pt... [2023-11-21 07:12:08,069] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-21000/global_step21000/mp_rank_00_model_states.pt. [2023-11-21 07:12:08,079] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-21000/global_step21000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 07:12:29,737] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-21000/global_step21000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 07:12:29,745] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-21000/global_step21000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 07:12:29,746] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step21000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 07:13:25,889 >> Feature extractor saved in ./preprocessor_config.json 35%|███████████████████████████████▉ | 21019/60000 [44:17:24<112:02:41, 10.35s/it] 35%|███████████████████████████████▉ | 21039/60000 [44:20:32<101:29:50, 9.38s/it] 35%|███████████████████████████████▉ | 21060/60000 [44:23:57<102:19:47, 9.46s/it] 35%|███████████████████████████████▉ | 21080/60000 [44:27:17<103:03:05, 9.53s/it] 35%|████████████████████████████████ | 21100/60000 [44:30:27<102:13:26, 9.46s/it] 35%|████████████████████████████████ | 21119/60000 [44:33:27<103:06:30, 9.55s/it] 35%|████████████████████████████████ | 21139/60000 [44:36:39<116:56:17, 10.83s/it] 35%|████████████████████████████████ | 21160/60000 [44:39:56<100:47:25, 9.34s/it] 35%|████████████████████████████████ | 21179/60000 [44:42:54<100:33:07, 9.32s/it] 35%|████████████████████████████████▌ | 21200/60000 [44:46:10<99:36:18, 9.24s/it] 35%|████████████████████████████████▌ | 21219/60000 [44:49:06<98:52:38, 9.18s/it] 35%|████████████████████████████████▏ | 21239/60000 [44:52:18<101:56:07, 9.47s/it] 35%|████████████████████████████████▌ | 21259/60000 [44:55:23<99:54:35, 9.28s/it] 35%|████████████████████████████████▋ | 21280/60000 [44:58:37<99:26:38, 9.25s/it] 36%|████████████████████████████████▎ | 21300/60000 [45:02:15<102:33:30, 9.54s/it] 36%|████████████████████████████████▎ | 21319/60000 [45:05:16<106:34:57, 9.92s/it] 36%|████████████████████████████████▋ | 21340/60000 [45:08:31<98:39:19, 9.19s/it] 36%|████████████████████████████████▍ | 21359/60000 [45:11:28<100:12:46, 9.34s/it] 36%|████████████████████████████████▍ | 21380/60000 [45:15:24<132:22:59, 12.34s/it] 36%|████████████████████████████████▍ | 21400/60000 [45:18:36<100:26:28, 9.37s/it] 36%|████████████████████████████████▍ | 21420/60000 [45:21:49<100:55:45, 9.42s/it] 36%|████████████████████████████████▌ | 21439/60000 [45:24:48<101:35:01, 9.48s/it] 36%|████████████████████████████████▌ | 21460/60000 [45:28:15<105:57:54, 9.90s/it] 36%|████████████████████████████████▌ | 21480/60000 [45:31:33<106:30:07, 9.95s/it] 36%|████████████████████████████████▌ | 21500/60000 [45:34:54<107:02:52, 10.01s/it] 36%|████████████████████████████████▉ | 21519/60000 [45:37:54<99:40:32, 9.32s/it] 36%|█████████████████████████████████ | 21539/60000 [45:40:59<98:23:17, 9.21s/it] 36%|█████████████████████████████████ | 21559/60000 [45:44:07<99:44:53, 9.34s/it] 36%|████████████████████████████████▋ | 21579/60000 [45:47:13<100:00:18, 9.37s/it] 36%|████████████████████████████████▊ | 21600/60000 [45:50:35<100:22:53, 9.41s/it] 36%|████████████████████████████████▊ | 21620/60000 [45:53:51<112:57:52, 10.60s/it] 36%|█████████████████████████████████▏ | 21623/60000 [45:54:16<93:17:22, 8.75s/it] 36%|█████████████████████████████████▏ | 21640/60000 [45:56:53<98:50:19, 9.28s/it] 36%|█████████████████████████████████▏ | 21660/60000 [46:00:00<98:34:35, 9.26s/it] 36%|████████████████████████████████▉ | 21680/60000 [46:03:14<102:20:24, 9.61s/it] 36%|█████████████████████████████████▎ | 21700/60000 [46:06:21<99:49:14, 9.38s/it] 36%|█████████████████████████████████▎ | 21720/60000 [46:09:26<98:46:04, 9.29s/it] 36%|█████████████████████████████████▎ | 21740/60000 [46:12:33<99:50:42, 9.39s/it] Reading metadata...: 1650it [00:00, 7772.12it/s] | 21750/60000 [46:14:26<141:25:59, 13.31s/it] 36%|█████████████████████████████████ | 21760/60000 [46:16:04<104:50:00, 9.87s/it] 36%|█████████████████████████████████ | 21780/60000 [46:19:19<100:52:25, 9.50s/it] 36%|█████████████████████████████████▍ | 21799/60000 [46:22:18<99:26:29, 9.37s/it] 36%|█████████████████████████████████ | 21819/60000 [46:25:27<101:29:49, 9.57s/it] 36%|█████████████████████████████████▍ | 21840/60000 [46:28:43<97:58:22, 9.24s/it] 36%|█████████████████████████████████▏ | 21860/60000 [46:32:18<108:28:36, 10.24s/it] 36%|█████████████████████████████████▌ | 21879/60000 [46:35:15<98:36:15, 9.31s/it] 36%|█████████████████████████████████▏ | 21899/60000 [46:38:24<100:11:02, 9.47s/it] 37%|█████████████████████████████████▏ | 21919/60000 [46:41:35<101:47:46, 9.62s/it] 37%|█████████████████████████████████▎ | 21939/60000 [46:44:52<104:12:16, 9.86s/it] 37%|█████████████████████████████████▎ | 21959/60000 [46:48:08<100:00:40, 9.46s/it] 37%|█████████████████████████████████▋ | 21980/60000 [46:51:23<98:55:55, 9.37s/it] 37%|█████████████████████████████████▋ | 21999/60000 [46:54:20<99:12:52, 9.40s/it] 37%|█████████████████████████████████▎ | 22000/60000 [46:54:30<100:45:44, 9.55s/it][INFO|trainer.py:3173] 2023-11-21 09:53:45,184 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 09:53:45,185 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 09:53:45,185 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8936.45it/s] [INFO|trainer_utils.py:759] 2023-11-21 09:53:46,128 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.1820068359375, 'eval_wer': 8.528081417263476, 'eval_runtime': 601.4562, 'eval_samples_per_second': 2.833, 'eval_steps_per_second': 0.708, 'epoch': 0.37} 37%|█████████████████████████████████▎ | 22000/60000 [47:04:32<100:45:44, 9.55s/it][INFO|trainer.py:2896] 2023-11-21 10:04:17,934 >> Saving model checkpoint to ./checkpoint-22000 [INFO|configuration_utils.py:462] 2023-11-21 10:04:17,944 >> Configuration saved in ./checkpoint-22000/config.json [INFO|configuration_utils.py:568] 2023-11-21 10:04:17,948 >> Configuration saved in ./checkpoint-22000/generation_config.json [2023-11-21 10:05:05,458] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step22000 is about to be saved! [2023-11-21 10:05:05,479] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-22000/global_step22000/mp_rank_00_model_states.pt [2023-11-21 10:05:05,479] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-22000/global_step22000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-21 10:05:05,440 >> Model weights saved in ./checkpoint-22000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 10:05:05,445 >> Feature extractor saved in ./checkpoint-22000/preprocessor_config.json [2023-11-21 10:05:15,298] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-22000/global_step22000/mp_rank_00_model_states.pt. [2023-11-21 10:05:15,316] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-22000/global_step22000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 10:05:40,030] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-22000/global_step22000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 10:05:40,041] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-22000/global_step22000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 10:05:40,042] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step22000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 10:06:38,208 >> Feature extractor saved in ./preprocessor_config.json 37%|█████████████████████████████████▍ | 22019/60000 [47:10:27<104:40:14, 9.92s/it] 37%|█████████████████████████████████▍ | 22039/60000 [47:14:12<101:51:28, 9.66s/it] Reading metadata...: 2165it [00:00, 10884.72it/s] | 22053/60000 [47:16:25<100:24:43, 9.53s/it] 37%|█████████████████████████████████▍ | 22059/60000 [47:17:24<101:43:10, 9.65s/it] 37%|█████████████████████████████████▍ | 22079/60000 [47:20:40<101:41:23, 9.65s/it] 37%|█████████████████████████████████▉ | 22099/60000 [47:23:49<98:05:38, 9.32s/it] 37%|█████████████████████████████████▌ | 22119/60000 [47:27:03<100:20:53, 9.54s/it] 37%|█████████████████████████████████▉ | 22139/60000 [47:30:10<98:54:48, 9.41s/it] 37%|█████████████████████████████████▉ | 22160/60000 [47:33:29<99:27:33, 9.46s/it] 37%|██████████████████████████████████ | 22179/60000 [47:36:28<97:32:43, 9.28s/it] 37%|█████████████████████████████████▋ | 22199/60000 [47:39:39<107:46:06, 10.26s/it] 37%|██████████████████████████████████ | 22219/60000 [47:42:46<98:15:45, 9.36s/it] 37%|██████████████████████████████████ | 22239/60000 [47:45:54<99:18:55, 9.47s/it] 37%|██████████████████████████████████▏ | 22259/60000 [47:49:00<97:02:42, 9.26s/it] 37%|██████████████████████████████████▏ | 22279/60000 [47:52:09<98:56:59, 9.44s/it] 37%|██████████████████████████████████▏ | 22300/60000 [47:55:40<97:28:49, 9.31s/it] 37%|██████████████████████████████████▏ | 22319/60000 [47:58:37<97:27:39, 9.31s/it] 37%|██████████████████████████████████▎ | 22339/60000 [48:01:43<98:52:52, 9.45s/it] 37%|██████████████████████████████████▎ | 22359/60000 [48:04:50<98:06:38, 9.38s/it] 37%|██████████████████████████████████▎ | 22379/60000 [48:08:00<99:53:28, 9.56s/it] 37%|██████████████████████████████████▎ | 22399/60000 [48:11:08<97:22:27, 9.32s/it] 37%|██████████████████████████████████▍ | 22420/60000 [48:14:25<98:12:37, 9.41s/it] 37%|██████████████████████████████████▍ | 22439/60000 [48:17:22<96:07:56, 9.21s/it] 37%|██████████████████████████████████▍ | 22459/60000 [48:20:28<96:53:35, 9.29s/it] 37%|██████████████████████████████████▍ | 22480/60000 [48:23:47<95:10:57, 9.13s/it] 38%|██████████████████████████████████▌ | 22500/60000 [48:26:53<95:56:35, 9.21s/it] 38%|██████████████████████████████████▏ | 22520/60000 [48:30:11<109:28:18, 10.51s/it] 38%|██████████████████████████████████▌ | 22540/60000 [48:33:18<96:13:10, 9.25s/it] 38%|██████████████████████████████████▌ | 22559/60000 [48:36:20<97:07:08, 9.34s/it] 38%|██████████████████████████████████▌ | 22580/60000 [48:39:35<96:44:00, 9.31s/it] 38%|██████████████████████████████████▋ | 22599/60000 [48:42:32<96:33:17, 9.29s/it] 38%|██████████████████████████████████▎ | 22620/60000 [48:45:59<106:36:32, 10.27s/it] 38%|██████████████████████████████████▋ | 22624/60000 [48:46:33<88:54:09, 8.56s/it] 38%|██████████████████████████████████▋ | 22625/60000 [48:46:39<81:48:56, 7.88s/it] 38%|██████████████████████████████████▋ | 22639/60000 [48:48:50<98:20:02, 9.48s/it] 38%|██████████████████████████████████▎ | 22660/60000 [48:52:37<126:14:58, 12.17s/it] 38%|██████████████████████████████████▊ | 22680/60000 [48:55:43<95:38:07, 9.23s/it] 38%|██████████████████████████████████▊ | 22700/60000 [48:58:57<99:44:08, 9.63s/it] 38%|██████████████████████████████████▍ | 22720/60000 [49:02:08<100:35:44, 9.71s/it] 38%|██████████████████████████████████▊ | 22740/60000 [49:05:20<95:18:31, 9.21s/it] Reading metadata...: 1650it [00:00, 7451.68it/s] | 22741/60000 [49:05:29<95:37:07, 9.24s/it] 38%|██████████████████████████████████▌ | 22759/60000 [49:08:43<147:57:14, 14.30s/it] 38%|██████████████████████████████████▉ | 22780/60000 [49:12:00<96:03:03, 9.29s/it] 38%|██████████████████████████████████▌ | 22800/60000 [49:15:22<103:32:34, 10.02s/it] 38%|██████████████████████████████████▌ | 22819/60000 [49:18:25<102:39:58, 9.94s/it] 38%|███████████████████████████████████ | 22840/60000 [49:21:41<96:49:33, 9.38s/it] 38%|███████████████████████████████████ | 22860/60000 [49:24:50<95:53:40, 9.30s/it] 38%|███████████████████████████████████ | 22879/60000 [49:27:50<98:57:57, 9.60s/it] 38%|███████████████████████████████████ | 22899/60000 [49:30:57<97:39:17, 9.48s/it] 38%|███████████████████████████████████▏ | 22920/60000 [49:34:18<96:48:01, 9.40s/it] 38%|███████████████████████████████████▏ | 22940/60000 [49:37:24<94:58:13, 9.23s/it] 38%|███████████████████████████████████▏ | 22959/60000 [49:40:30<96:04:32, 9.34s/it] 38%|███████████████████████████████████▏ | 22980/60000 [49:43:46<94:46:28, 9.22s/it] 38%|███████████████████████████████████▎ | 23000/60000 [49:46:58<95:46:43, 9.32s/it][INFO|trainer.py:3173] 2023-11-21 12:46:13,232 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 12:46:13,232 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 12:46:13,232 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8490.40it/s] [INFO|trainer_utils.py:759] 2023-11-21 12:46:14,237 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'loss': 0.0463, 'learning_rate': 1.883084745762712e-06, 'epoch': 0.38} 38%|███████████████████████████████████▎ | 23000/60000 [49:57:06<95:46:43, 9.32s/it] 38%|███████████████████████████████████▎ | 23000/60000 [49:57:06<95:46:43, 9.32s/it][INFO|trainer.py:2896] 2023-11-21 12:56:42,686 >> Saving model checkpoint to ./checkpoint-23000 [INFO|configuration_utils.py:462] 2023-11-21 12:56:42,697 >> Configuration saved in ./checkpoint-23000/config.json [INFO|configuration_utils.py:568] 2023-11-21 12:56:42,710 >> Configuration saved in ./checkpoint-23000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-21 12:57:28,606 >> Model weights saved in ./checkpoint-23000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 12:57:28,613 >> Feature extractor saved in ./checkpoint-23000/preprocessor_config.json [2023-11-21 12:57:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step23000 is about to be saved! [2023-11-21 12:57:28,731] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-23000/global_step23000/mp_rank_00_model_states.pt [2023-11-21 12:57:28,732] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-23000/global_step23000/mp_rank_00_model_states.pt... [2023-11-21 12:57:37,797] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-23000/global_step23000/mp_rank_00_model_states.pt. [2023-11-21 12:57:37,803] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-23000/global_step23000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 12:57:57,138] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-23000/global_step23000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 12:57:57,147] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-23000/global_step23000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 12:57:57,148] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step23000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 12:58:55,172 >> Feature extractor saved in ./preprocessor_config.json 38%|██████████████████████████████████▉ | 23020/60000 [50:02:54<100:26:12, 9.78s/it] 38%|███████████████████████████████████▎ | 23040/60000 [50:06:04<99:14:41, 9.67s/it] 38%|███████████████████████████████████▎ | 23060/60000 [50:09:15<97:21:26, 9.49s/it] 38%|███████████████████████████████████ | 23080/60000 [50:12:33<113:46:18, 11.09s/it] 38%|███████████████████████████████████▍ | 23100/60000 [50:15:41<95:58:13, 9.36s/it] 39%|███████████████████████████████████▍ | 23119/60000 [50:18:39<95:31:48, 9.32s/it] 39%|███████████████████████████████████▍ | 23140/60000 [50:21:54<95:03:51, 9.28s/it] 39%|███████████████████████████████████▏ | 23160/60000 [50:25:39<215:56:02, 21.10s/it] 39%|███████████████████████████████████▌ | 23180/60000 [50:28:51<95:16:19, 9.32s/it] 39%|███████████████████████████████████▌ | 23200/60000 [50:32:00<96:32:42, 9.44s/it] 39%|███████████████████████████████████▌ | 23220/60000 [50:35:04<94:06:45, 9.21s/it] 39%|███████████████████████████████████▋ | 23240/60000 [50:38:10<93:37:16, 9.17s/it] 39%|███████████████████████████████████▎ | 23260/60000 [50:41:21<101:23:37, 9.94s/it] 39%|███████████████████████████████████▋ | 23279/60000 [50:44:33<97:17:24, 9.54s/it] 39%|███████████████████████████████████▋ | 23300/60000 [50:48:03<97:17:24, 9.54s/it] 39%|███████████████████████████████████▊ | 23320/60000 [50:51:11<96:22:56, 9.46s/it] 39%|███████████████████████████████████▊ | 23339/60000 [50:54:09<96:14:37, 9.45s/it] Reading metadata...: 2165it [00:00, 12657.98it/s] | 23353/60000 [50:56:25<94:59:11, 9.33s/it] 39%|███████████████████████████████████▊ | 23360/60000 [50:57:31<94:54:32, 9.33s/it] 39%|███████████████████████████████████▊ | 23379/60000 [51:00:28<94:12:49, 9.26s/it] 39%|███████████████████████████████████▉ | 23400/60000 [51:03:42<94:22:16, 9.28s/it] 39%|███████████████████████████████████▉ | 23420/60000 [51:06:49<95:00:41, 9.35s/it] 39%|███████████████████████████████████▉ | 23440/60000 [51:10:02<96:06:33, 9.46s/it] 39%|███████████████████████████████████▉ | 23460/60000 [51:13:08<93:39:07, 9.23s/it] 39%|████████████████████████████████████ | 23480/60000 [51:16:15<94:32:40, 9.32s/it] 39%|███████████████████████████████████▋ | 23500/60000 [51:19:46<100:41:14, 9.93s/it] 39%|████████████████████████████████████ | 23520/60000 [51:22:54<95:27:04, 9.42s/it] 39%|████████████████████████████████████ | 23540/60000 [51:26:07<93:19:50, 9.22s/it] 39%|████████████████████████████████████▏ | 23560/60000 [51:29:13<93:31:32, 9.24s/it] 39%|████████████████████████████████████▏ | 23580/60000 [51:32:45<95:58:45, 9.49s/it] 39%|████████████████████████████████████▏ | 23600/60000 [51:35:50<92:28:39, 9.15s/it] 39%|████████████████████████████████████▏ | 23620/60000 [51:39:11<99:17:18, 9.83s/it] 39%|████████████████████████████████████▏ | 23626/60000 [51:40:03<84:43:43, 8.39s/it] [2023-11-21 14:39:18,403] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 39%|████████████████████████████████████▏ | 23640/60000 [51:42:11<94:07:28, 9.32s/it] 39%|████████████████████████████████████▎ | 23660/60000 [51:45:15<93:01:22, 9.22s/it] 39%|████████████████████████████████████▎ | 23680/60000 [51:48:21<93:24:48, 9.26s/it] 40%|███████████████████████████████████▉ | 23700/60000 [51:51:32<101:50:55, 10.10s/it] 40%|████████████████████████████████████▎ | 23719/60000 [51:54:29<94:03:13, 9.33s/it] Reading metadata...: 1650it [00:00, 10372.32it/s] | 23731/60000 [51:56:21<94:49:48, 9.41s/it] 40%|████████████████████████████████████▍ | 23739/60000 [51:57:38<94:36:59, 9.39s/it] 40%|████████████████████████████████████▍ | 23759/60000 [52:01:03<96:08:41, 9.55s/it] 40%|████████████████████████████████████▍ | 23779/60000 [52:04:10<93:29:28, 9.29s/it] 40%|████████████████████████████████████▍ | 23799/60000 [52:07:23<93:12:16, 9.27s/it] 40%|████████████████████████████████████▌ | 23819/60000 [52:10:31<93:28:15, 9.30s/it] 40%|████████████████████████████████████▌ | 23839/60000 [52:13:37<95:52:23, 9.54s/it] 40%|████████████████████████████████████▌ | 23860/60000 [52:16:56<95:24:39, 9.50s/it] 40%|████████████████████████████████████▌ | 23879/60000 [52:19:58<95:55:45, 9.56s/it] 40%|████████████████████████████████████▋ | 23899/60000 [52:23:05<93:44:10, 9.35s/it] 40%|████████████████████████████████████▋ | 23919/60000 [52:26:11<93:00:43, 9.28s/it] 40%|████████████████████████████████████▋ | 23940/60000 [52:29:29<95:56:20, 9.58s/it] 40%|████████████████████████████████████▋ | 23959/60000 [52:32:36<94:03:17, 9.39s/it] 40%|████████████████████████████████████▊ | 23979/60000 [52:35:49<95:30:53, 9.55s/it] 40%|████████████████████████████████████▊ | 23999/60000 [52:38:57<93:25:58, 9.34s/it] 40%|████████████████████████████████████▊ | 24000/60000 [52:39:06<93:16:16, 9.33s/it][INFO|trainer.py:3173] 2023-11-21 15:38:21,410 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 15:38:21,411 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 15:38:21,411 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8355.48it/s] [INFO|trainer_utils.py:759] 2023-11-21 15:38:22,382 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.1917724609375, 'eval_wer': 8.452695062193744, 'eval_runtime': 605.9268, 'eval_samples_per_second': 2.812, 'eval_steps_per_second': 0.703, 'epoch': 0.4} 40%|████████████████████████████████████▊ | 24000/60000 [52:49:12<93:16:16, 9.33s/it][INFO|trainer.py:2896] 2023-11-21 15:48:55,359 >> Saving model checkpoint to ./checkpoint-24000 [INFO|configuration_utils.py:462] 2023-11-21 15:48:55,366 >> Configuration saved in ./checkpoint-24000/config.json [INFO|configuration_utils.py:568] 2023-11-21 15:48:55,371 >> Configuration saved in ./checkpoint-24000/generation_config.json [2023-11-21 15:49:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step24000 is about to be saved! [2023-11-21 15:49:36,259] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-24000/global_step24000/mp_rank_00_model_states.pt [2023-11-21 15:49:36,260] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-24000/global_step24000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-21 15:49:36,197 >> Model weights saved in ./checkpoint-24000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 15:49:36,203 >> Feature extractor saved in ./checkpoint-24000/preprocessor_config.json [2023-11-21 15:49:46,079] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-24000/global_step24000/mp_rank_00_model_states.pt. [2023-11-21 15:49:46,092] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-24000/global_step24000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 15:50:18,129] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-24000/global_step24000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 15:50:18,137] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-24000/global_step24000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 15:50:18,138] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step24000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 15:51:16,618 >> Feature extractor saved in ./preprocessor_config.json 40%|████████████████████████████████████▍ | 24019/60000 [52:55:10<101:14:13, 10.13s/it] 40%|████████████████████████████████████▊ | 24039/60000 [52:58:35<95:51:40, 9.60s/it] 40%|████████████████████████████████████▉ | 24059/60000 [53:01:53<96:28:53, 9.66s/it] 40%|████████████████████████████████████▉ | 24079/60000 [53:05:04<93:45:00, 9.40s/it] 40%|████████████████████████████████████▉ | 24099/60000 [53:08:11<94:26:12, 9.47s/it] 40%|████████████████████████████████████▉ | 24120/60000 [53:11:30<94:43:06, 9.50s/it] 40%|████████████████████████████████████▌ | 24139/60000 [53:14:32<102:08:33, 10.25s/it] 40%|█████████████████████████████████████ | 24159/60000 [53:17:40<94:18:35, 9.47s/it] 40%|█████████████████████████████████████ | 24179/60000 [53:20:47<93:24:27, 9.39s/it] 40%|█████████████████████████████████████ | 24199/60000 [53:23:55<93:35:42, 9.41s/it] 40%|█████████████████████████████████████▏ | 24219/60000 [53:27:01<93:03:50, 9.36s/it] 40%|█████████████████████████████████████▏ | 24240/60000 [53:30:50<93:38:02, 9.43s/it] 40%|█████████████████████████████████████▏ | 24260/60000 [53:33:57<93:01:53, 9.37s/it] 40%|█████████████████████████████████████▏ | 24280/60000 [53:37:06<93:10:14, 9.39s/it] 40%|█████████████████████████████████████▎ | 24300/60000 [53:40:24<94:36:07, 9.54s/it] 41%|████████████████████████████████████▉ | 24319/60000 [53:43:38<103:24:07, 10.43s/it] 41%|█████████████████████████████████████▎ | 24339/60000 [53:46:52<95:14:58, 9.62s/it] 41%|█████████████████████████████████████▎ | 24359/60000 [53:50:03<95:15:48, 9.62s/it] 41%|█████████████████████████████████████▍ | 24380/60000 [53:53:21<92:25:02, 9.34s/it] 41%|█████████████████████████████████████▍ | 24399/60000 [53:56:21<92:54:26, 9.39s/it] 41%|█████████████████████████████████████▍ | 24420/60000 [53:59:46<93:35:17, 9.47s/it] 41%|█████████████████████████████████████▍ | 24440/60000 [54:02:56<93:23:46, 9.46s/it] 41%|█████████████████████████████████████▌ | 24460/60000 [54:06:06<93:14:41, 9.45s/it] 41%|█████████████████████████████████████▌ | 24480/60000 [54:09:49<96:09:19, 9.75s/it] 41%|█████████████████████████████████████▌ | 24499/60000 [54:12:51<92:48:11, 9.41s/it] 41%|█████████████████████████████████████▌ | 24520/60000 [54:16:07<93:11:53, 9.46s/it] 41%|█████████████████████████████████████▋ | 24540/60000 [54:19:13<90:50:48, 9.22s/it] 41%|█████████████████████████████████████▋ | 24560/60000 [54:22:19<93:38:47, 9.51s/it] 41%|█████████████████████████████████████▋ | 24579/60000 [54:25:15<90:47:11, 9.23s/it] 41%|█████████████████████████████████████▋ | 24600/60000 [54:28:36<92:17:27, 9.39s/it] 41%|█████████████████████████████████████▊ | 24620/60000 [54:31:51<96:41:38, 9.84s/it] 41%|█████████████████████████████████████▊ | 24627/60000 [54:32:57<92:17:47, 9.39s/it] 41%|█████████████████████████████████████▊ | 24628/60000 [54:33:03<82:53:37, 8.44s/it] 41%|█████████████████████████████████████▊ | 24640/60000 [54:34:52<92:02:30, 9.37s/it] Reading metadata...: 2165it [00:00, 13319.12it/s] | 24652/60000 [54:36:44<90:55:05, 9.26s/it] 41%|█████████████████████████████████████▊ | 24659/60000 [54:37:50<91:45:06, 9.35s/it] 41%|█████████████████████████████████████▊ | 24680/60000 [54:41:10<91:43:57, 9.35s/it] 41%|█████████████████████████████████████▍ | 24699/60000 [54:44:23<115:24:07, 11.77s/it] 41%|█████████████████████████████████████▉ | 24719/60000 [54:47:32<93:00:42, 9.49s/it] Reading metadata...: 1650it [00:00, 2908.17it/s] | 24721/60000 [54:47:51<93:16:47, 9.52s/it] 41%|█████████████████████████████████████▉ | 24739/60000 [54:50:45<92:33:29, 9.45s/it] 41%|█████████████████████████████████████▉ | 24760/60000 [54:54:07<95:11:39, 9.72s/it] 41%|█████████████████████████████████████▉ | 24779/60000 [54:57:14<93:08:34, 9.52s/it] 41%|█████████████████████████████████████▌ | 24799/60000 [55:01:03<104:11:49, 10.66s/it] 41%|█████████████████████████████████████▋ | 24820/60000 [55:04:43<103:31:37, 10.59s/it] 41%|██████████████████████████████████████ | 24840/60000 [55:08:07<95:13:26, 9.75s/it] 41%|██████████████████████████████████████ | 24860/60000 [55:11:22<93:29:32, 9.58s/it] 41%|██████████████████████████████████████▏ | 24880/60000 [55:14:37<98:38:23, 10.11s/it] 42%|██████████████████████████████████████▏ | 24900/60000 [55:17:49<91:21:24, 9.37s/it] 42%|██████████████████████████████████████▏ | 24920/60000 [55:21:03<94:23:00, 9.69s/it] 42%|██████████████████████████████████████▏ | 24940/60000 [55:24:19<95:10:24, 9.77s/it] 42%|██████████████████████████████████████▎ | 24960/60000 [55:28:09<90:13:15, 9.27s/it] 42%|██████████████████████████████████████▎ | 24979/60000 [55:31:40<95:02:47, 9.77s/it] 42%|██████████████████████████████████████▎ | 25000/60000 [55:35:01<93:04:09, 9.57s/it][INFO|trainer.py:3173] 2023-11-21 18:34:15,793 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 18:34:15,793 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 18:34:15,793 >> Batch size = 4 Reading metadata...: 1it [00:00, 6.16it/s] [INFO|trainer_utils.py:759] 2023-11-21 18:34:17,113 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.1820068359375, 'eval_wer': 8.311345646437996, 'eval_runtime': 612.2957, 'eval_samples_per_second': 2.783, 'eval_steps_per_second': 0.696, 'epoch': 0.42} 42%|██████████████████████████████████████▎ | 25000/60000 [55:45:13<93:04:09, 9.57s/it][INFO|trainer.py:2896] 2023-11-21 18:45:00,361 >> Saving model checkpoint to ./checkpoint-25000 [INFO|configuration_utils.py:462] 2023-11-21 18:45:00,373 >> Configuration saved in ./checkpoint-25000/config.json [INFO|configuration_utils.py:568] 2023-11-21 18:45:00,381 >> Configuration saved in ./checkpoint-25000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-21 18:45:42,429 >> Model weights saved in ./checkpoint-25000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 18:45:42,455 >> Feature extractor saved in ./checkpoint-25000/preprocessor_config.json [2023-11-21 18:45:42,860] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step25000 is about to be saved! [2023-11-21 18:45:42,898] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-25000/global_step25000/mp_rank_00_model_states.pt [2023-11-21 18:45:42,898] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-25000/global_step25000/mp_rank_00_model_states.pt... [2023-11-21 18:45:48,554] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-25000/global_step25000/mp_rank_00_model_states.pt. [2023-11-21 18:45:48,559] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-25000/global_step25000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 18:46:09,786] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-25000/global_step25000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 18:46:09,795] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-25000/global_step25000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 18:46:09,796] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step25000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 18:47:10,590 >> Feature extractor saved in ./preprocessor_config.json 42%|█████████████████████████████████████▉ | 25019/60000 [55:51:06<100:19:49, 10.33s/it] 42%|██████████████████████████████████████▍ | 25039/60000 [55:54:26<92:29:28, 9.52s/it] 42%|██████████████████████████████████████▍ | 25059/60000 [55:57:46<99:09:24, 10.22s/it] 42%|██████████████████████████████████████▍ | 25079/60000 [56:01:08<94:34:43, 9.75s/it] 42%|██████████████████████████████████████▍ | 25099/60000 [56:04:20<96:22:22, 9.94s/it] 42%|██████████████████████████████████████▌ | 25119/60000 [56:07:42<94:45:09, 9.78s/it] 42%|██████████████████████████████████████▌ | 25139/60000 [56:11:00<95:48:51, 9.89s/it] 42%|██████████████████████████████████████▌ | 25159/60000 [56:14:13<91:31:31, 9.46s/it] 42%|██████████████████████████████████████▌ | 25179/60000 [56:17:29<92:59:29, 9.61s/it] 42%|██████████████████████████████████████▏ | 25200/60000 [56:21:05<101:06:36, 10.46s/it] 42%|██████████████████████████████████████▋ | 25219/60000 [56:24:10<91:26:56, 9.47s/it] 42%|██████████████████████████████████████▋ | 25239/60000 [56:27:29<98:01:52, 10.15s/it] 42%|██████████████████████████████████████▋ | 25259/60000 [56:30:51<99:18:45, 10.29s/it] 42%|██████████████████████████████████████▊ | 25279/60000 [56:34:11<95:59:56, 9.95s/it] 42%|██████████████████████████████████████▊ | 25300/60000 [56:37:56<92:37:56, 9.61s/it] 42%|██████████████████████████████████████▊ | 25319/60000 [56:41:01<94:00:31, 9.76s/it] 42%|██████████████████████████████████████▊ | 25340/60000 [56:44:29<98:27:29, 10.23s/it] 42%|██████████████████████████████████████▉ | 25359/60000 [56:47:40<96:51:19, 10.07s/it] 42%|██████████████████████████████████████▍ | 25379/60000 [56:51:26<145:20:28, 15.11s/it] 42%|██████████████████████████████████████▉ | 25399/60000 [56:54:43<92:04:11, 9.58s/it] 42%|██████████████████████████████████████▉ | 25420/60000 [56:58:10<94:13:19, 9.81s/it] 42%|███████████████████████████████████████ | 25439/60000 [57:01:17<95:44:06, 9.97s/it] 42%|███████████████████████████████████████ | 25459/60000 [57:04:35<93:50:31, 9.78s/it] 42%|███████████████████████████████████████ | 25479/60000 [57:07:54<91:49:50, 9.58s/it] 42%|███████████████████████████████████████ | 25500/60000 [57:11:28<99:11:30, 10.35s/it] 43%|███████████████████████████████████████▏ | 25520/60000 [57:14:55<94:17:46, 9.85s/it] 43%|███████████████████████████████████████▏ | 25540/60000 [57:18:29<95:40:40, 10.00s/it] 43%|███████████████████████████████████████▏ | 25560/60000 [57:21:45<90:25:38, 9.45s/it] 43%|███████████████████████████████████████▏ | 25579/60000 [57:24:51<95:46:13, 10.02s/it] 43%|███████████████████████████████████████▎ | 25600/60000 [57:28:20<95:11:54, 9.96s/it] 43%|███████████████████████████████████████▎ | 25619/60000 [57:31:31<97:22:21, 10.20s/it] 43%|███████████████████████████████████████▎ | 25629/60000 [57:33:06<89:37:29, 9.39s/it] 43%|███████████████████████████████████████▎ | 25630/60000 [57:33:13<82:25:03, 8.63s/it] 43%|██████████████████████████████████████▉ | 25639/60000 [57:34:40<103:36:33, 10.86s/it] 43%|███████████████████████████████████████▎ | 25659/60000 [57:38:06<93:07:19, 9.76s/it] 43%|███████████████████████████████████████▎ | 25679/60000 [57:41:18<91:18:35, 9.58s/it] 43%|██████████████████████████████████████▉ | 25699/60000 [57:44:56<162:20:20, 17.04s/it] Reading metadata...: 1650it [00:00, 7197.96it/s] | 25711/60000 [57:46:52<93:40:48, 9.84s/it] 43%|███████████████████████████████████████▍ | 25720/60000 [57:48:19<91:01:44, 9.56s/it] 43%|███████████████████████████████████████▍ | 25740/60000 [57:51:39<93:42:03, 9.85s/it] 43%|███████████████████████████████████████▍ | 25759/60000 [57:54:45<91:10:14, 9.59s/it] 43%|███████████████████████████████████████▌ | 25780/60000 [57:58:11<93:52:05, 9.88s/it] 43%|███████████████████████████████████████▌ | 25799/60000 [58:01:21<97:04:12, 10.22s/it] 43%|███████████████████████████████████████▌ | 25820/60000 [58:04:53<93:29:26, 9.85s/it] 43%|███████████████████████████████████████▌ | 25840/60000 [58:08:08<93:01:06, 9.80s/it] 43%|███████████████████████████████████████▏ | 25859/60000 [58:11:29<106:48:54, 11.26s/it] 43%|███████████████████████████████████████▎ | 25880/60000 [58:15:16<100:05:48, 10.56s/it] 43%|███████████████████████████████████████▎ | 25900/60000 [58:18:54<103:24:18, 10.92s/it] 43%|███████████████████████████████████████▎ | 25920/60000 [58:22:40<102:46:12, 10.86s/it] 43%|███████████████████████████████████████▎ | 25940/60000 [58:26:19<102:57:50, 10.88s/it] 43%|███████████████████████████████████████▎ | 25951/60000 [58:28:27<109:01:34, 11.53s/it] 43%|███████████████████████████████████████▊ | 25959/60000 [58:29:51<96:50:12, 10.24s/it] 43%|███████████████████████████████████████▊ | 25980/60000 [58:33:22<95:53:08, 10.15s/it] 43%|███████████████████████████████████████▊ | 26000/60000 [58:36:57<98:33:45, 10.44s/it][INFO|trainer.py:3173] 2023-11-21 21:36:12,243 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-21 21:36:12,244 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-21 21:36:12,245 >> Batch size = 4 Reading metadata...: 0it [00:00, ?it/s] [INFO|trainer_utils.py:759] 2023-11-21 21:36:13,298 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale. If client_id, up_votes, input_length, segment, age, down_votes, gender, accent, path, locale are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.197509765625, 'eval_wer': 8.301922352054278, 'eval_runtime': 649.294, 'eval_samples_per_second': 2.624, 'eval_steps_per_second': 0.656, 'epoch': 0.43} 43%|███████████████████████████████████████▊ | 26000/60000 [58:47:47<98:33:45, 10.44s/it][INFO|trainer.py:2896] 2023-11-21 21:47:29,639 >> Saving model checkpoint to ./checkpoint-26000 [INFO|configuration_utils.py:462] 2023-11-21 21:47:29,648 >> Configuration saved in ./checkpoint-26000/config.json [INFO|configuration_utils.py:568] 2023-11-21 21:47:29,654 >> Configuration saved in ./checkpoint-26000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-21 21:48:11,967 >> Model weights saved in ./checkpoint-26000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-21 21:48:11,976 >> Feature extractor saved in ./checkpoint-26000/preprocessor_config.json [2023-11-21 21:48:12,023] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step26000 is about to be saved! [2023-11-21 21:48:12,057] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-26000/global_step26000/mp_rank_00_model_states.pt [2023-11-21 21:48:12,057] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-26000/global_step26000/mp_rank_00_model_states.pt... [2023-11-21 21:48:18,344] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-26000/global_step26000/mp_rank_00_model_states.pt. [2023-11-21 21:48:18,353] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-26000/global_step26000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-21 21:48:43,408] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-26000/global_step26000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-21 21:48:43,420] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-26000/global_step26000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-21 21:48:43,421] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step26000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-21 21:49:50,035 >> Feature extractor saved in ./preprocessor_config.json 43%|███████████████████████████████████████▉ | 26020/60000 [58:55:11<98:22:34, 10.42s/it] 43%|███████████████████████████████████████▉ | 26040/60000 [58:58:35<98:01:10, 10.39s/it] 43%|███████████████████████████████████████▉ | 26060/60000 [59:02:08<99:53:05, 10.59s/it] 43%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 26080/60000 [59:05:57<154:00:39, 16.35s/it] 44%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 26100/60000 [59:09:41<102:52:30, 10.92s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 26120/60000 [59:13:12<97:03:22, 10.31s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 26140/60000 [59:16:51<102:12:37, 10.87s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 26160/60000 [59:20:28<99:36:20, 10.60s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 26179/60000 [59:24:01<100:50:54, 10.73s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 26200/60000 [59:27:46<99:39:12, 10.61s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 26219/60000 [59:31:13<103:37:30, 11.04s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 26239/60000 [59:34:58<102:30:57, 10.93s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 26259/60000 [59:38:38<100:57:00, 10.77s/it] 44%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 26279/60000 [59:42:33<133:37:48, 14.27s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 26299/60000 [59:46:26<99:23:57, 10.62s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 26319/60000 [59:50:01<96:12:37, 10.28s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 26340/60000 [59:53:36<93:07:52, 9.96s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 26360/60000 [59:57:08<100:57:02, 10.80s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 26379/60000 [60:00:32<99:36:38, 10.67s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 26399/60000 [60:04:05<97:20:23, 10.43s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 26420/60000 [60:07:53<102:31:13, 10.99s/it] 44%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 26432/60000 [60:10:06<109:13:12, 11.71s/it]Traceback (most recent call last): File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status response.raise_for_status() File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/datasets/Finnish-NLP/aalto_eduskunta_asr_audio_processed/resolve/3bdd45efc0b1a61f49cc79e94eb66e15c4432c89/data/train-00034-of-00064-2ff163d1bb53e2f0.parquet The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/mnt/e/run_speech_recognition_seq2seq_streaming.py", line 679, in main() File "/mnt/e/run_speech_recognition_seq2seq_streaming.py", line 628, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1547, in train return inner_training_loop( File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/transformers/trainer.py", line 1839, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/accelerate/data_loader.py", line 675, in __iter__ next_batch, next_batch_info = self._fetch_batches(main_iterator) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/accelerate/data_loader.py", line 604, in _fetch_batches batches.append(next(iterator)) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__ data = self._next_data() File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch data.append(next(self.dataset_iter)) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__ for key, example in ex_iterable: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 862, in __iter__ yield from self._iter() File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 899, in _iter for key, example in iterator: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 982, in __iter__ for x in self.ex_iterable: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 678, in __iter__ yield from self._iter() File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 740, in _iter for key, example in iterator: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1114, in __iter__ for key, example in self.ex_iterable: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 429, in __iter__ if not iterators[i].hasnext(): File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 106, in hasnext self._thenext = next(self.it) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 678, in __iter__ yield from self._iter() File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 740, in _iter for key, example in iterator: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1114, in __iter__ for key, example in self.ex_iterable: File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 320, in __iter__ for key, pa_table in self.generate_tables_fn(**kwargs_with_shuffled_shards): File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/packaged_modules/parquet/parquet.py", line 85, in _generate_tables parquet_file = pq.ParquetFile(f) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 341, in __init__ self.reader.open( File "pyarrow/_parquet.pyx", line 1249, in pyarrow._parquet.ParquetReader.open File "pyarrow/types.pxi", line 88, in pyarrow.lib._datatype_to_pep3118 File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/datasets/download/streaming_download_manager.py", line 333, in read_with_retries out = read(*args, **kwargs) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/fsspec/spec.py", line 1856, in read out = self.cache._fetch(self.loc, self.loc + length) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/fsspec/caching.py", line 189, in _fetch self.cache = self.fetcher(start, end) # new block replaces old File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/huggingface_hub/hf_file_system.py", line 445, in _fetch_range hf_raise_for_status(r) File "/home/rasmus/miniconda3/envs/WhisperFinetuneEnv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 330, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/datasets/Finnish-NLP/aalto_eduskunta_asr_audio_processed/resolve/3bdd45efc0b1a61f49cc79e94eb66e15c4432c89/data/train-00034-of-00064-2ff163d1bb53e2f0.parquet (Request ID: Root=1-655d1c83-6ac0a7bb67f4b01056d447b6;974175f5-471f-4e9c-a910-35fa68eae047) Internal Error - We're working hard to fix this as soon as possible!