Reading metadata...: 2165it [00:00, 7592.80it/s] | 0/60000 [00:00> The following columns in the training set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. Reading metadata...: 1650it [00:00, 3986.08it/s] Reading metadata...: 2165it [00:00, 13545.67it/s] Reading metadata...: 1650it [00:00, 10348.13it/s] Reading metadata...: 2165it [00:00, 12831.95it/s] Reading metadata...: 1650it [00:00, 6479.01it/s] Reading metadata...: 2165it [00:00, 14459.50it/s] Reading metadata...: 1650it [00:00, 10677.16it/s] Reading metadata...: 1650it [00:00, 9893.23it/s] Reading metadata...: 2165it [00:00, 13273.75it/s] Reading metadata...: 1650it [00:00, 10247.75it/s] Reading metadata...: 2165it [00:00, 13002.70it/s] Reading metadata...: 1650it [00:00, 10760.26it/s] Reading metadata...: 1it [00:00, 6.83it/s] Reading metadata...: 1650it [00:00, 10429.30it/s] Reading metadata...: 1650it [00:00, 10010.39it/s] Reading metadata...: 1it [00:00, 6.32it/s] Reading metadata...: 1650it [00:00, 9846.96it/s] Reading metadata...: 2165it [00:00, 14576.65it/s] Reading metadata...: 1650it [00:00, 10496.91it/s] Reading metadata...: 1it [00:00, 6.65it/s] Reading metadata...: 1650it [00:00, 10731.98it/s] Reading metadata...: 1650it [00:00, 10192.94it/s] Reading metadata...: 2165it [00:00, 14388.52it/s] Reading metadata...: 1650it [00:00, 10318.18it/s] Reading metadata...: 2165it [00:00, 10779.99it/s] Reading metadata...: 1650it [00:00, 10400.30it/s] Reading metadata...: 1it [00:00, 6.62it/s] Reading metadata...: 1650it [00:00, 10310.88it/s] Reading metadata...: 1it [00:00, 6.52it/s] Reading metadata...: 2165it [00:00, 13536.12it/s] Reading metadata...: 1650it [00:00, 10202.73it/s] Reading metadata...: 2165it [00:00, 13215.47it/s] Reading metadata...: 1it [00:00, 6.41it/s] Reading metadata...: 2165it [00:00, 13018.82it/s] Reading metadata...: 1650it [00:00, 9363.43it/s] Reading metadata...: 2165it [00:00, 13756.34it/s] Reading metadata...: 1650it [00:00, 10248.05it/s] Reading metadata...: 1650it [00:00, 10028.17it/s] Reading metadata...: 2165it [00:00, 8726.08it/s] Reading metadata...: 1650it [00:00, 10196.52it/s] Reading metadata...: 2165it [00:00, 12341.08it/s] Reading metadata...: 1650it [00:00, 10403.48it/s] Reading metadata...: 2165it [00:00, 13263.85it/s] Reading metadata...: 1650it [00:00, 10447.25it/s] Reading metadata...: 1650it [00:00, 10033.01it/s] Reading metadata...: 2165it [00:00, 13651.16it/s] Reading metadata...: 1650it [00:00, 10463.47it/s] Reading metadata...: 2165it [00:00, 13476.50it/s] Reading metadata...: 1650it [00:00, 10691.23it/s] [WARNING|logging.py:329] 2023-11-23 05:49:00,929 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... 47%|█████████▊ | 28020/60000 [6:11:58<16:02:04, 1.81s/it] 47%|█████████▊ | 28039/60000 [6:15:03<83:34:21, 9.41s/it] 47%|█████████▊ | 28059/60000 [6:18:15<87:37:17, 9.88s/it] 47%|█████████▊ | 28080/60000 [6:21:46<88:50:12, 10.02s/it] 47%|█████████▎ | 28100/60000 [6:25:28<142:16:10, 16.06s/it] 47%|█████████▊ | 28119/60000 [6:28:41<86:47:07, 9.80s/it] 47%|█████████▊ | 28140/60000 [6:32:05<87:28:40, 9.88s/it] 47%|█████████▊ | 28160/60000 [6:35:19<85:47:42, 9.70s/it] 47%|█████████▊ | 28180/60000 [6:38:34<87:03:37, 9.85s/it] 47%|█████████▊ | 28200/60000 [6:41:54<87:27:36, 9.90s/it] 47%|█████████▉ | 28219/60000 [6:44:58<85:41:44, 9.71s/it] 47%|█████████▉ | 28240/60000 [6:48:23<86:56:11, 9.85s/it] 47%|█████████▉ | 28260/60000 [6:51:36<84:14:12, 9.55s/it] 47%|█████████▉ | 28280/60000 [6:55:04<99:07:26, 11.25s/it] 47%|█████████▉ | 28299/60000 [6:58:15<86:40:19, 9.84s/it] 47%|█████████▉ | 28320/60000 [7:01:37<84:41:55, 9.62s/it] 47%|█████████▉ | 28340/60000 [7:04:49<83:56:16, 9.54s/it] 47%|█████████▉ | 28360/60000 [7:08:03<84:54:52, 9.66s/it] 47%|█████████▉ | 28380/60000 [7:11:24<86:37:21, 9.86s/it] 47%|█████████▉ | 28400/60000 [7:14:39<84:44:56, 9.65s/it] 47%|█████████▉ | 28419/60000 [7:17:42<84:39:26, 9.65s/it] 47%|█████████▉ | 28440/60000 [7:21:07<84:52:30, 9.68s/it] 47%|█████████▉ | 28460/60000 [7:24:25<97:23:39, 11.12s/it] 47%|█████████▉ | 28480/60000 [7:27:40<84:47:51, 9.69s/it] 47%|█████████▉ | 28499/60000 [7:30:43<85:34:59, 9.78s/it] 48%|█████████▉ | 28519/60000 [7:34:53<90:24:03, 10.34s/it] 48%|█████████▉ | 28539/60000 [7:38:18<83:19:22, 9.53s/it] Reading metadata...: 2165it [00:00, 13134.50it/s]83:34:36, 9.57s/it] 48%|█████████▉ | 28559/60000 [7:41:37<85:19:21, 9.77s/it] 48%|██████████ | 28579/60000 [7:44:47<83:35:47, 9.58s/it] 48%|██████████ | 28599/60000 [7:47:59<83:19:47, 9.55s/it] 48%|██████████ | 28619/60000 [7:51:25<84:20:22, 9.68s/it] 48%|██████████ | 28635/60000 [7:54:01<84:19:29, 9.68s/it] 48%|██████████ | 28637/60000 [7:54:18<81:43:30, 9.38s/it] 48%|█████████▌ | 28639/60000 [7:55:01<145:12:54, 16.67s/it] 48%|██████████ | 28659/60000 [7:58:16<86:30:01, 9.94s/it] 48%|██████████ | 28679/60000 [8:01:28<83:28:52, 9.60s/it] Reading metadata...: 1650it [00:00, 4787.14it/s]<84:57:46, 9.77s/it] 48%|██████████ | 28699/60000 [8:04:41<83:24:06, 9.59s/it] 48%|██████████ | 28719/60000 [8:07:52<82:36:44, 9.51s/it] 48%|██████████ | 28739/60000 [8:11:10<82:36:02, 9.51s/it] 48%|██████████ | 28759/60000 [8:14:21<83:24:13, 9.61s/it] 48%|██████████ | 28779/60000 [8:17:34<82:40:17, 9.53s/it] 48%|██████████ | 28799/60000 [8:20:46<82:06:57, 9.47s/it] 48%|██████████ | 28819/60000 [8:24:07<86:53:33, 10.03s/it] 48%|██████████ | 28839/60000 [8:27:18<83:09:46, 9.61s/it] 48%|██████████ | 28859/60000 [8:30:28<81:41:17, 9.44s/it] 48%|██████████ | 28879/60000 [8:33:39<84:01:01, 9.72s/it] 48%|██████████ | 28899/60000 [8:36:49<82:33:38, 9.56s/it] 48%|██████████ | 28919/60000 [8:40:06<81:32:21, 9.44s/it] 48%|██████████▏ | 28939/60000 [8:43:19<81:49:47, 9.48s/it] 48%|██████████▏ | 28959/60000 [8:46:38<82:14:03, 9.54s/it] 48%|██████████▏ | 28979/60000 [8:49:49<82:48:07, 9.61s/it] 48%|██████████▏ | 29000/60000 [8:53:14<82:32:56, 9.59s/it][INFO|trainer.py:3173] 2023-11-23 08:35:01,681 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-23 08:35:01,682 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-23 08:35:01,682 >> Batch size = 4 Reading metadata...: 1704it [00:00, 10656.43it/s] Reading metadata...: 1it [00:00, 6.65it/s] [INFO|trainer_utils.py:759] 2023-11-23 08:35:02,636 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age. If down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.19091796875, 'eval_wer': 8.330192235205427, 'eval_runtime': 607.5708, 'eval_samples_per_second': 2.805, 'eval_steps_per_second': 0.701, 'epoch': 0.48} 48%|██████████▏ | 29000/60000 [9:03:22<82:32:56, 9.59s/it][INFO|trainer.py:2896] 2023-11-23 08:45:12,972 >> Saving model checkpoint to ./checkpoint-29000 [INFO|configuration_utils.py:462] 2023-11-23 08:45:12,986 >> Configuration saved in ./checkpoint-29000/config.json [INFO|configuration_utils.py:568] 2023-11-23 08:45:12,991 >> Configuration saved in ./checkpoint-29000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-23 08:45:41,039 >> Model weights saved in ./checkpoint-29000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-23 08:45:41,047 >> Feature extractor saved in ./checkpoint-29000/preprocessor_config.json [2023-11-23 08:45:41,103] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step29000 is about to be saved! [2023-11-23 08:45:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-29000/global_step29000/mp_rank_00_model_states.pt [2023-11-23 08:45:41,140] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-29000/global_step29000/mp_rank_00_model_states.pt... [2023-11-23 08:46:03,560] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-29000/global_step29000/mp_rank_00_model_states.pt. [2023-11-23 08:46:35,716] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-29000/global_step29000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-23 08:47:36,799] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-29000/global_step29000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-23 08:47:36,821] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-29000/global_step29000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-23 08:47:36,827] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step29000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-23 08:48:33,268 >> Feature extractor saved in ./preprocessor_config.json 48%|██████████▏ | 29019/60000 [9:10:26<91:11:36, 10.60s/it] 48%|██████████▏ | 29039/60000 [9:13:46<88:03:50, 10.24s/it] 48%|██████████▏ | 29059/60000 [9:17:04<84:00:37, 9.77s/it] 48%|██████████▏ | 29080/60000 [9:20:39<92:22:52, 10.76s/it] 48%|██████████▏ | 29099/60000 [9:23:43<83:24:26, 9.72s/it] 49%|██████████▏ | 29119/60000 [9:26:58<81:51:00, 9.54s/it] 49%|██████████▏ | 29139/60000 [9:30:11<82:11:33, 9.59s/it] 49%|██████████▏ | 29159/60000 [9:33:25<82:17:32, 9.61s/it] 49%|██████████▏ | 29179/60000 [9:36:49<90:12:46, 10.54s/it] 49%|██████████▏ | 29200/60000 [9:40:24<87:33:43, 10.23s/it] 49%|██████████▏ | 29220/60000 [9:43:42<84:20:19, 9.86s/it] 49%|██████████▏ | 29239/60000 [9:46:50<83:21:09, 9.75s/it] 49%|██████████▏ | 29260/60000 [9:50:23<86:01:10, 10.07s/it] 49%|██████████▏ | 29279/60000 [9:53:44<95:31:58, 11.19s/it] 49%|██████████▎ | 29299/60000 [9:57:12<83:44:55, 9.82s/it] 49%|█████████▊ | 29319/60000 [10:00:28<82:02:13, 9.63s/it] 49%|█████████▊ | 29340/60000 [10:03:52<83:00:25, 9.75s/it] 49%|█████████▊ | 29360/60000 [10:07:10<83:05:27, 9.76s/it] 49%|█████████▊ | 29380/60000 [10:10:46<88:50:51, 10.45s/it] 49%|█████████▊ | 29399/60000 [10:13:50<81:25:21, 9.58s/it] 49%|█████████▊ | 29419/60000 [10:17:02<81:32:06, 9.60s/it] 49%|█████████▊ | 29439/60000 [10:20:20<82:08:03, 9.68s/it] 49%|█████████▊ | 29459/60000 [10:23:49<84:20:46, 9.94s/it] 49%|█████████▊ | 29479/60000 [10:27:05<83:13:29, 9.82s/it] 49%|█████████▊ | 29499/60000 [10:30:17<81:34:06, 9.63s/it] 49%|█████████▊ | 29519/60000 [10:33:33<90:22:14, 10.67s/it] 49%|█████████▊ | 29539/60000 [10:36:46<81:25:44, 9.62s/it] 49%|█████████▊ | 29560/60000 [10:40:04<80:49:14, 9.56s/it] 49%|█████████▊ | 29579/60000 [10:43:05<81:37:34, 9.66s/it] 49%|█████████▊ | 29599/60000 [10:46:20<81:06:14, 9.60s/it] 49%|█████████▊ | 29619/60000 [10:49:50<86:24:20, 10.24s/it] 49%|█████████▉ | 29638/60000 [10:52:51<72:37:00, 8.61s/it] 49%|█████████▉ | 29639/60000 [10:52:58<67:10:10, 7.96s/it] [2023-11-23 10:34:45,267] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 49%|█████████▉ | 29660/60000 [10:56:17<80:12:33, 9.52s/it] Reading metadata...: 1650it [00:00, 9494.14it/s]<81:33:52, 9.68s/it] 49%|█████████▉ | 29679/60000 [10:59:23<81:41:42, 9.70s/it] 49%|█████████▉ | 29699/60000 [11:02:42<85:14:16, 10.13s/it] 50%|█████████▉ | 29719/60000 [11:05:53<80:14:58, 9.54s/it] 50%|█████████▉ | 29740/60000 [11:09:12<80:22:17, 9.56s/it] 50%|█████████▉ | 29760/60000 [11:12:27<81:09:55, 9.66s/it] 50%|█████████▉ | 29779/60000 [11:15:29<80:25:00, 9.58s/it] 50%|█████████▉ | 29800/60000 [11:19:00<81:34:22, 9.72s/it] 50%|█████████▉ | 29820/60000 [11:22:11<79:15:11, 9.45s/it] 50%|█████████▉ | 29840/60000 [11:25:23<79:57:25, 9.54s/it] Reading metadata...: 2165it [00:00, 3642.98it/s]<80:42:51, 9.64s/it] 50%|█████████▉ | 29859/60000 [11:28:29<80:50:42, 9.66s/it] 50%|█████████▉ | 29880/60000 [11:31:58<81:50:14, 9.78s/it] 50%|█████████▉ | 29899/60000 [11:34:59<79:31:12, 9.51s/it] 50%|█████████▉ | 29919/60000 [11:38:37<99:04:03, 11.86s/it] 50%|█████████▉ | 29939/60000 [11:41:54<82:04:31, 9.83s/it] 50%|█████████▉ | 29960/60000 [11:45:33<93:34:41, 11.21s/it] 50%|█████████▉ | 29979/60000 [11:48:38<81:32:05, 9.78s/it] 50%|██████████ | 30000/60000 [11:51:59<78:56:02, 9.47s/it][INFO|trainer.py:3173] 2023-11-23 11:33:47,040 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-23 11:33:47,041 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-23 11:33:47,041 >> Batch size = 4 Reading metadata...: 1704it [00:00, 8465.87it/s] Reading metadata...: 1it [00:00, 6.49it/s] [INFO|trainer_utils.py:759] 2023-11-23 11:33:48,041 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age. If down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. 50%|██████████ | 30000/60000 [12:02:00<78:56:02, 9.47s/it] 50%|██████████ | 30000/60000 [12:02:00<78:56:02, 9.47s/it][INFO|trainer.py:2896] 2023-11-23 11:44:15,860 >> Saving model checkpoint to ./checkpoint-30000 [INFO|configuration_utils.py:462] 2023-11-23 11:44:15,870 >> Configuration saved in ./checkpoint-30000/config.json [INFO|configuration_utils.py:568] 2023-11-23 11:44:15,875 >> Configuration saved in ./checkpoint-30000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-23 11:45:02,136 >> Model weights saved in ./checkpoint-30000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-23 11:45:02,141 >> Feature extractor saved in ./checkpoint-30000/preprocessor_config.json [2023-11-23 11:45:02,169] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step30000 is about to be saved! [2023-11-23 11:45:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-30000/global_step30000/mp_rank_00_model_states.pt [2023-11-23 11:45:02,194] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-30000/global_step30000/mp_rank_00_model_states.pt... [2023-11-23 11:45:08,737] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-30000/global_step30000/mp_rank_00_model_states.pt. [2023-11-23 11:45:08,744] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-30000/global_step30000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-23 11:45:36,754] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-30000/global_step30000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-23 11:45:36,765] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-30000/global_step30000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-23 11:45:36,767] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step30000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-23 11:46:33,500 >> Feature extractor saved in ./preprocessor_config.json 50%|██████████ | 30020/60000 [12:08:10<85:33:08, 10.27s/it] 50%|██████████ | 30039/60000 [12:11:36<88:09:32, 10.59s/it] 50%|██████████ | 30059/60000 [12:14:58<82:44:30, 9.95s/it] 50%|██████████ | 30079/60000 [12:18:14<80:32:25, 9.69s/it] 50%|██████████ | 30099/60000 [12:21:31<80:38:44, 9.71s/it] 50%|██████████ | 30120/60000 [12:25:23<83:32:40, 10.07s/it] 50%|██████████ | 30140/60000 [12:28:42<82:54:00, 9.99s/it] 50%|██████████ | 30160/60000 [12:31:52<78:04:06, 9.42s/it] 50%|██████████ | 30180/60000 [12:35:03<79:27:09, 9.59s/it] 50%|██████████ | 30199/60000 [12:38:06<79:25:53, 9.60s/it] 50%|██████████ | 30219/60000 [12:41:16<78:47:02, 9.52s/it] 50%|██████████ | 30240/60000 [12:44:41<77:42:06, 9.40s/it] 50%|██████████ | 30260/60000 [12:47:50<78:27:31, 9.50s/it] 50%|██████████ | 30280/60000 [12:51:14<84:49:11, 10.27s/it] 50%|██████████ | 30300/60000 [12:54:59<93:55:24, 11.38s/it] 51%|██████████ | 30320/60000 [12:58:16<79:14:44, 9.61s/it] 51%|██████████ | 30340/60000 [13:01:28<79:01:40, 9.59s/it] 51%|██████████ | 30360/60000 [13:04:38<77:57:09, 9.47s/it] 51%|██████████▏ | 30379/60000 [13:07:40<78:18:06, 9.52s/it] 51%|██████████▏ | 30400/60000 [13:11:02<79:33:20, 9.68s/it] 51%|██████████▏ | 30420/60000 [13:14:40<80:27:51, 9.79s/it] 51%|██████████▏ | 30439/60000 [13:17:44<80:26:22, 9.80s/it] 51%|██████████▏ | 30460/60000 [13:21:07<78:05:36, 9.52s/it] 51%|██████████▏ | 30480/60000 [13:24:20<78:26:07, 9.57s/it] 51%|██████████▏ | 30500/60000 [13:27:41<79:28:50, 9.70s/it] 51%|██████████▏ | 30520/60000 [13:30:53<78:24:01, 9.57s/it] 51%|██████████▏ | 30540/60000 [13:34:06<78:36:09, 9.61s/it] 51%|██████████▏ | 30560/60000 [13:37:18<81:20:11, 9.95s/it] 51%|██████████▏ | 30580/60000 [13:40:36<82:45:06, 10.13s/it] 51%|██████████▏ | 30600/60000 [13:43:49<78:35:08, 9.62s/it] 51%|██████████▏ | 30620/60000 [13:47:10<78:22:15, 9.60s/it] 51%|██████████▏ | 30640/60000 [13:50:20<70:25:28, 8.64s/it] [2023-11-23 13:32:07,459] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 51%|██████████▏ | 30641/60000 [13:50:26<64:34:06, 7.92s/it] 51%|██████████▏ | 30660/60000 [13:53:29<79:20:08, 9.73s/it] Reading metadata...: 1650it [00:00, 10271.70it/s]78:55:47, 9.68s/it] 51%|██████████▏ | 30680/60000 [13:56:51<78:41:06, 9.66s/it] 51%|██████████▏ | 30700/60000 [14:00:04<78:09:47, 9.60s/it] 51%|██████████▏ | 30719/60000 [14:03:08<79:19:49, 9.75s/it] 51%|██████████▏ | 30740/60000 [14:06:32<79:48:04, 9.82s/it] 51%|██████████▎ | 30760/60000 [14:09:50<80:23:14, 9.90s/it] 51%|██████████▎ | 30780/60000 [14:13:03<78:14:38, 9.64s/it] 51%|█████████▊ | 30800/60000 [14:16:44<100:31:41, 12.39s/it] 51%|██████████▎ | 30820/60000 [14:20:09<80:03:04, 9.88s/it] 51%|██████████▎ | 30840/60000 [14:23:52<78:43:13, 9.72s/it] 51%|██████████▎ | 30860/60000 [14:27:41<80:51:40, 9.99s/it] 51%|██████████▎ | 30880/60000 [14:30:55<78:07:44, 9.66s/it] 52%|██████████▎ | 30900/60000 [14:34:12<78:39:08, 9.73s/it] 52%|██████████▎ | 30920/60000 [14:37:26<78:29:36, 9.72s/it] 52%|██████████▎ | 30940/60000 [14:40:44<77:20:19, 9.58s/it] 52%|██████████▎ | 30960/60000 [14:44:07<75:55:03, 9.41s/it] 52%|██████████▎ | 30979/60000 [14:47:09<78:34:27, 9.75s/it] 52%|██████████▎ | 31000/60000 [14:50:32<77:14:57, 9.59s/it][INFO|trainer.py:3173] 2023-11-23 14:32:19,981 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-23 14:32:19,981 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-23 14:32:19,981 >> Batch size = 4 {'loss': 0.0228, 'learning_rate': 1.4771186440677966e-06, 'epoch': 0.52} Reading metadata...: 1704it [00:00, 9424.36it/s] [INFO|trainer_utils.py:759] 2023-11-23 14:32:21,199 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age. If down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.202880859375, 'eval_wer': 8.386732001507728, 'eval_runtime': 595.6213, 'eval_samples_per_second': 2.861, 'eval_steps_per_second': 0.715, 'epoch': 0.52} 52%|██████████▎ | 31000/60000 [15:00:28<77:14:57, 9.59s/it][INFO|trainer.py:2896] 2023-11-23 14:42:39,683 >> Saving model checkpoint to ./checkpoint-31000 [INFO|configuration_utils.py:462] 2023-11-23 14:42:39,693 >> Configuration saved in ./checkpoint-31000/config.json [INFO|configuration_utils.py:568] 2023-11-23 14:42:39,711 >> Configuration saved in ./checkpoint-31000/generation_config.json [2023-11-23 14:43:25,683] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step31000 is about to be saved! [2023-11-23 14:43:25,724] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-31000/global_step31000/mp_rank_00_model_states.pt [2023-11-23 14:43:25,724] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-31000/global_step31000/mp_rank_00_model_states.pt... [INFO|modeling_utils.py:2194] 2023-11-23 14:43:25,656 >> Model weights saved in ./checkpoint-31000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-23 14:43:25,662 >> Feature extractor saved in ./checkpoint-31000/preprocessor_config.json [2023-11-23 14:43:39,756] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-31000/global_step31000/mp_rank_00_model_states.pt. [2023-11-23 14:43:39,775] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-31000/global_step31000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-23 14:44:06,768] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-31000/global_step31000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-23 14:44:06,798] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-31000/global_step31000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-23 14:44:06,798] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step31000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-23 14:45:03,018 >> Feature extractor saved in ./preprocessor_config.json 52%|██████████▎ | 31019/60000 [15:06:34<95:41:12, 11.89s/it] 52%|██████████▎ | 31039/60000 [15:09:53<79:19:11, 9.86s/it] 52%|██████████▎ | 31059/60000 [15:13:09<78:01:30, 9.71s/it] 52%|██████████▎ | 31079/60000 [15:16:29<80:23:28, 10.01s/it] 52%|██████████▎ | 31099/60000 [15:19:46<78:37:11, 9.79s/it] 52%|██████████▎ | 31119/60000 [15:23:05<76:41:11, 9.56s/it] 52%|██████████▍ | 31140/60000 [15:26:29<77:42:41, 9.69s/it] Reading metadata...: 2165it [00:00, 6749.61it/s]<76:31:41, 9.55s/it] 52%|██████████▍ | 31159/60000 [15:29:32<77:20:47, 9.65s/it] 52%|██████████▍ | 31180/60000 [15:32:55<77:27:19, 9.68s/it] 52%|██████████▍ | 31199/60000 [15:36:08<80:50:31, 10.10s/it] 52%|██████████▍ | 31219/60000 [15:39:21<76:46:47, 9.60s/it] 52%|██████████▍ | 31239/60000 [15:42:33<76:31:21, 9.58s/it] 52%|██████████▍ | 31259/60000 [15:45:44<76:36:26, 9.60s/it] 52%|██████████▍ | 31279/60000 [15:49:12<84:00:36, 10.53s/it] 52%|██████████▍ | 31299/60000 [15:52:29<78:14:05, 9.81s/it] 52%|██████████▍ | 31319/60000 [15:55:44<75:27:27, 9.47s/it] 52%|██████████▍ | 31340/60000 [15:59:07<77:56:01, 9.79s/it] 52%|██████████▍ | 31360/60000 [16:03:13<97:38:32, 12.27s/it] 52%|██████████▍ | 31379/60000 [16:06:22<77:34:20, 9.76s/it] 52%|██████████▍ | 31399/60000 [16:09:37<76:35:12, 9.64s/it] 52%|██████████▍ | 31419/60000 [16:12:52<78:40:22, 9.91s/it] 52%|██████████▍ | 31440/60000 [16:16:16<77:01:35, 9.71s/it] 52%|██████████▍ | 31460/60000 [16:19:34<80:42:00, 10.18s/it] 52%|██████████▍ | 31479/60000 [16:22:36<75:20:09, 9.51s/it] 52%|██████████▍ | 31499/60000 [16:25:46<74:22:41, 9.39s/it] 53%|██████████▌ | 31520/60000 [16:29:07<75:48:58, 9.58s/it] 53%|██████████▌ | 31539/60000 [16:32:23<81:02:04, 10.25s/it] 53%|██████████▌ | 31559/60000 [16:35:43<75:24:15, 9.54s/it] 53%|██████████▌ | 31579/60000 [16:38:55<76:18:50, 9.67s/it] 53%|██████████▌ | 31600/60000 [16:42:46<75:49:11, 9.61s/it] 53%|██████████▌ | 31619/60000 [16:45:59<77:53:34, 9.88s/it] 53%|██████████▌ | 31639/60000 [16:49:17<80:36:13, 10.23s/it] 53%|██████████▌ | 31642/60000 [16:49:43<69:23:13, 8.81s/it] 53%|██████████▌ | 31643/60000 [16:49:49<62:55:40, 7.99s/it] Reading metadata...: 1650it [00:00, 4118.28it/s]<75:10:32, 9.55s/it] 53%|██████████▌ | 31659/60000 [16:52:24<76:39:33, 9.74s/it] 53%|██████████▌ | 31679/60000 [16:55:36<75:08:29, 9.55s/it] 53%|██████████▌ | 31699/60000 [16:58:50<77:46:06, 9.89s/it] 53%|██████████▌ | 31719/60000 [17:02:07<75:27:43, 9.61s/it] 53%|██████████▌ | 31739/60000 [17:05:51<81:27:09, 10.38s/it] 53%|██████████▌ | 31759/60000 [17:09:05<75:06:48, 9.58s/it] 53%|██████████▌ | 31779/60000 [17:12:19<76:15:47, 9.73s/it] 53%|██████████▌ | 31799/60000 [17:15:35<75:16:05, 9.61s/it] 53%|██████████▌ | 31819/60000 [17:18:52<76:15:16, 9.74s/it] 53%|██████████▌ | 31839/60000 [17:22:03<74:49:45, 9.57s/it] 53%|██████████▌ | 31859/60000 [17:25:14<75:52:47, 9.71s/it] 53%|██████████▋ | 31879/60000 [17:28:26<74:55:17, 9.59s/it] 53%|██████████▋ | 31899/60000 [17:31:38<76:16:29, 9.77s/it] 53%|██████████▋ | 31919/60000 [17:34:55<74:49:00, 9.59s/it] 53%|██████████▋ | 31939/60000 [17:38:07<74:52:52, 9.61s/it] 53%|██████████▋ | 31959/60000 [17:41:28<74:09:04, 9.52s/it] 53%|██████████▋ | 31980/60000 [17:44:54<76:17:54, 9.80s/it] 53%|██████████▋ | 31999/60000 [17:48:04<75:32:17, 9.71s/it] 53%|██████████▋ | 32000/60000 [17:48:13<74:55:28, 9.63s/it][INFO|trainer.py:3173] 2023-11-23 17:30:00,790 >> ***** Running Evaluation ***** [INFO|trainer.py:3177] 2023-11-23 17:30:00,791 >> Num examples: Unknown [INFO|trainer.py:3178] 2023-11-23 17:30:00,791 >> Batch size = 4 Reading metadata...: 1704it [00:00, 5075.76it/s] [INFO|trainer_utils.py:759] 2023-11-23 17:30:02,321 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age. If down_votes, up_votes, gender, locale, segment, client_id, accent, input_length, path, age are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. {'eval_loss': 0.2012939453125, 'eval_wer': 8.254805880135695, 'eval_runtime': 610.9979, 'eval_samples_per_second': 2.789, 'eval_steps_per_second': 0.697, 'epoch': 0.53} 53%|██████████▋ | 32000/60000 [17:58:24<74:55:28, 9.63s/it][INFO|trainer.py:2896] 2023-11-23 17:40:42,279 >> Saving model checkpoint to ./checkpoint-32000 [INFO|configuration_utils.py:462] 2023-11-23 17:40:42,287 >> Configuration saved in ./checkpoint-32000/config.json [INFO|configuration_utils.py:568] 2023-11-23 17:40:42,299 >> Configuration saved in ./checkpoint-32000/generation_config.json [INFO|modeling_utils.py:2194] 2023-11-23 17:41:23,174 >> Model weights saved in ./checkpoint-32000/pytorch_model.bin [INFO|feature_extraction_utils.py:425] 2023-11-23 17:41:23,179 >> Feature extractor saved in ./checkpoint-32000/preprocessor_config.json [2023-11-23 17:41:23,209] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step32000 is about to be saved! [2023-11-23 17:41:23,240] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-32000/global_step32000/mp_rank_00_model_states.pt [2023-11-23 17:41:23,241] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-32000/global_step32000/mp_rank_00_model_states.pt... [2023-11-23 17:41:29,490] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-32000/global_step32000/mp_rank_00_model_states.pt. [2023-11-23 17:41:29,497] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./checkpoint-32000/global_step32000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-11-23 17:41:50,444] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./checkpoint-32000/global_step32000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-11-23 17:41:50,458] [INFO] [engine.py:3417:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-32000/global_step32000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-11-23 17:41:50,460] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step32000 is ready now! [INFO|feature_extraction_utils.py:425] 2023-11-23 17:42:47,521 >> Feature extractor saved in ./preprocessor_config.json 53%|██████████▋ | 32019/60000 [18:04:12<79:28:54, 10.23s/it] 53%|██████████▋ | 32039/60000 [18:07:36<77:58:57, 10.04s/it] 53%|██████████▋ | 32059/60000 [18:10:56<77:19:23, 9.96s/it]