wandb: Currently logged in as: sanchit-gandhi (use `wandb login --relogin` to force relogin) wandb: wandb version 0.12.16 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.12.15 wandb: Run data is saved locally in /home/sanchitgandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline/wandb/run-20220510_091910-3qamzxaf wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run flax-wav2vec2-2-bart-large-voxpopuli-baseline wandb: ⭐️ View project at https://wandb.ai/sanchit-gandhi/voxpopuli wandb: 🚀 View run at https://wandb.ai/sanchit-gandhi/voxpopuli/runs/3qamzxaf 05/10/2022 09:19:11 - INFO - __main__ - Training/evaluation parameters FlaxSeq2SeqTrainingArguments( _n_gpu=-1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=, deepspeed=None, disable_tqdm=None, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=10000, evaluation_strategy=no, final_generation_max_length=200, final_generation_num_beams=5, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, generation_length_penalty=1.2, generation_max_length=40, generation_num_beams=1, gradient_accumulation_steps=1, gradient_checkpointing=True, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=None, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, matmul_precision=default, max_grad_norm=1.0, max_steps=50000, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, precision=full, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=None, resume_from_checkpoint=None, run_name=None, save_on_each_node=False, save_steps=10000, save_strategy=steps, save_total_limit=1, seed=42, sharded_ddp=, skip_memory_metrics=True, sortish_sampler=False, tf32=None, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 05/10/2022 09:19:11 - INFO - __main__ - JAX devices: 8, matmul precision: default Downloading data files: 0% 0/12 [00:00) Training...: 32% 822/2608 [50:09<2:00:33, 4.05s/it] Training...: 32% 823/2608 [50:12<1:57:40, 3.96s/it] Training...: 32% 824/2608 [50:16<1:54:19, 3.84s/it] Training...: 32% 825/2608 [50:20<1:51:25, 3.75s/it] Training...: 32% 826/2608 [50:23<1:48:46, 3.66s/it] Training...: 32% 827/2608 [50:26<1:45:52, 3.57s/it] Training...: 32% 828/2608 [50:30<1:43:12, 3.48s/it] Training...: 32% 829/2608 [50:33<1:40:38, 3.39s/it] Training...: 32% 830/2608 [50:36<1:37:27, 3.29s/it] Training...: 32% 831/2608 [50:39<1:34:31, 3.19s/it] Training...: 32% 832/2608 [50:42<1:31:33, 3.09s/it] Training...: 32% 833/2608 [50:44<1:28:28, 2.99s/it] Training...: 32% 834/2608 [50:47<1:25:30, 2.89s/it] Training...: 32% 835/2608 [50:50<1:23:07, 2.81s/it] Training...: 32% 836/2608 [50:52<1:20:17, 2.72s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:50:22<23:39:35, 9464.00s/it] Training...: 32% 836/2608 [50:55<1:20:17, 2.72s/it] Training...: 32% 837/2608 [50:55<1:21:17, 2.75s/it] Training...: 32% 838/2608 [50:57<1:16:56, 2.61s/it] Training...: 32% 839/2608 [50:59<1:13:00, 2.48s/it] Training...: 32% 840/2608 [51:02<1:09:30, 2.36s/it] Training...: 32% 841/2608 [51:03<1:05:49, 2.23s/it] Training...: 32% 842/2608 [51:05<1:02:09, 2.11s/it] Training...: 32% 843/2608 [51:07<58:49, 2.00s/it]  Training...: 32% 844/2608 [51:09<55:28, 1.89s/it] Training...: 32% 845/2608 [51:10<51:57, 1.77s/it] Training...: 32% 846/2608 [51:12<48:30, 1.65s/it] Training...: 32% 847/2608 [51:13<44:22, 1.51s/it] Training...: 33% 848/2608 [51:14<40:05, 1.37s/it] Training...: 33% 849/2608 [51:15<35:24, 1.21s/it] Training...: 33% 850/2608 [51:15<29:59, 1.02s/it] Training...: 33% 851/2608 [51:23<1:27:54, 3.00s/it] Training...: 33% 852/2608 [51:30<2:06:29, 4.32s/it] Training...: 33% 853/2608 [51:37<2:31:25, 5.18s/it] Training...: 33% 854/2608 [51:44<2:45:07, 5.65s/it] Training...: 33% 855/2608 [51:51<2:51:22, 5.87s/it] Training...: 33% 856/2608 [51:57<2:52:32, 5.91s/it] Training...: 33% 857/2608 [52:02<2:51:29, 5.88s/it] Training...: 33% 858/2608 [52:08<2:47:32, 5.74s/it] Training...: 33% 859/2608 [52:13<2:44:47, 5.65s/it] Training...: 33% 860/2608 [52:18<2:40:27, 5.51s/it] Training...: 33% 861/2608 [52:24<2:37:13, 5.40s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:51:56<23:39:35, 9464.00s/it] Training...: 33% 861/2608 [52:29<2:37:13, 5.40s/it] Training...: 33% 862/2608 [52:29<2:38:54, 5.46s/it] Training...: 33% 863/2608 [52:34<2:33:22, 5.27s/it] Training...: 33% 864/2608 [52:39<2:27:51, 5.09s/it] Training...: 33% 865/2608 [52:43<2:23:30, 4.94s/it] Training...: 33% 866/2608 [52:48<2:18:26, 4.77s/it] Training...: 33% 867/2608 [52:52<2:14:51, 4.65s/it] Training...: 33% 868/2608 [52:56<2:10:58, 4.52s/it] Training...: 33% 869/2608 [53:00<2:08:14, 4.42s/it] Training...: 33% 870/2608 [53:04<2:04:58, 4.31s/it] Training...: 33% 871/2608 [53:08<2:02:12, 4.22s/it] Training...: 33% 872/2608 [53:12<1:59:05, 4.12s/it] Training...: 33% 873/2608 [53:16<1:56:29, 4.03s/it] Training...: 34% 874/2608 [53:20<1:53:02, 3.91s/it] Training...: 34% 875/2608 [53:23<1:50:30, 3.83s/it] Training...: 34% 876/2608 [53:27<1:47:09, 3.71s/it] Training...: 34% 877/2608 [53:30<1:44:12, 3.61s/it] Training...: 34% 878/2608 [53:34<1:41:37, 3.52s/it] Training...: 34% 879/2608 [53:37<1:39:17, 3.45s/it] Training...: 34% 880/2608 [53:40<1:36:18, 3.34s/it] Training...: 34% 881/2608 [53:43<1:34:05, 3.27s/it] Training...: 34% 882/2608 [53:46<1:31:20, 3.18s/it] Training...: 34% 883/2608 [53:49<1:28:07, 3.07s/it] Training...: 34% 884/2608 [53:51<1:25:08, 2.96s/it] Training...: 34% 885/2608 [53:54<1:22:29, 2.87s/it] Training...: 34% 886/2608 [53:57<1:19:26, 2.77s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:53:26<23:39:35, 9464.00s/it] Training...: 34% 886/2608 [53:59<1:19:26, 2.77s/it] Training...: 34% 887/2608 [53:59<1:19:42, 2.78s/it] Training...: 34% 888/2608 [54:02<1:15:30, 2.63s/it] Training...: 34% 889/2608 [54:04<1:11:56, 2.51s/it] Training...: 34% 890/2608 [54:06<1:08:18, 2.39s/it] Training...: 34% 891/2608 [54:08<1:04:35, 2.26s/it] Training...: 34% 892/2608 [54:10<1:01:02, 2.13s/it] Training...: 34% 893/2608 [54:12<57:42, 2.02s/it]  Training...: 34% 894/2608 [54:13<54:29, 1.91s/it] Training...: 34% 895/2608 [54:15<50:57, 1.79s/it] Training...: 34% 896/2608 [54:16<47:28, 1.66s/it] Training...: 34% 897/2608 [54:17<43:33, 1.53s/it] Training...: 34% 898/2608 [54:18<39:17, 1.38s/it] Training...: 34% 899/2608 [54:19<34:35, 1.21s/it] Training...: 35% 900/2608 [54:20<29:03, 1.02s/it] Training...: 35% 901/2608 [54:27<1:24:42, 2.98s/it] Training...: 35% 902/2608 [54:35<2:01:07, 4.26s/it] Training...: 35% 903/2608 [54:42<2:24:58, 5.10s/it] Training...: 35% 904/2608 [54:48<2:37:20, 5.54s/it] Training...: 35% 905/2608 [54:55<2:44:28, 5.79s/it] Training...: 35% 906/2608 [55:01<2:45:47, 5.84s/it] Training...: 35% 907/2608 [55:06<2:45:50, 5.85s/it] Training...: 35% 908/2608 [55:12<2:43:07, 5.76s/it] Training...: 35% 909/2608 [55:17<2:40:22, 5.66s/it] Training...: 35% 910/2608 [55:23<2:36:13, 5.52s/it] Training...: 35% 911/2608 [55:28<2:32:45, 5.40s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:55:00<23:39:35, 9464.00s/it] Training...: 35% 911/2608 [55:33<2:32:45, 5.40s/it] Training...: 35% 912/2608 [55:33<2:34:29, 5.47s/it] Training...: 35% 913/2608 [55:38<2:28:50, 5.27s/it] Training...: 35% 914/2608 [55:43<2:23:41, 5.09s/it] Training...: 35% 915/2608 [55:47<2:19:32, 4.95s/it] Training...: 35% 916/2608 [55:52<2:14:50, 4.78s/it] Training...: 35% 917/2608 [55:56<2:11:53, 4.68s/it] Training...: 35% 918/2608 [56:00<2:07:16, 4.52s/it] Training...: 35% 919/2608 [56:04<2:03:19, 4.38s/it] Training...: 35% 920/2608 [56:08<1:59:52, 4.26s/it] Training...: 35% 921/2608 [56:12<1:56:33, 4.15s/it] Training...: 35% 922/2608 [56:16<1:53:31, 4.04s/it] Training...: 35% 923/2608 [56:20<1:50:40, 3.94s/it] Training...: 35% 924/2608 [56:23<1:47:21, 3.83s/it] Training...: 35% 925/2608 [56:27<1:44:21, 3.72s/it] Training...: 36% 926/2608 [56:30<1:41:10, 3.61s/it] Training...: 36% 927/2608 [56:34<1:38:41, 3.52s/it] Training...: 36% 928/2608 [56:37<1:35:52, 3.42s/it] Training...: 36% 929/2608 [56:40<1:33:48, 3.35s/it] Training...: 36% 930/2608 [56:43<1:31:38, 3.28s/it] Training...: 36% 931/2608 [56:46<1:29:09, 3.19s/it] Training...: 36% 932/2608 [56:49<1:26:30, 3.10s/it] Training...: 36% 933/2608 [56:52<1:23:33, 2.99s/it] Training...: 36% 934/2608 [56:54<1:20:27, 2.88s/it] Training...: 36% 935/2608 [56:57<1:17:46, 2.79s/it] Training...: 36% 936/2608 [56:59<1:15:08, 2.70s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:56:29<23:39:35, 9464.00s/it] Training...: 36% 936/2608 [57:02<1:15:08, 2.70s/it] Training...: 36% 937/2608 [57:02<1:16:02, 2.73s/it] Training...: 36% 938/2608 [57:04<1:12:12, 2.59s/it] Training...: 36% 939/2608 [57:07<1:08:47, 2.47s/it] Training...: 36% 940/2608 [57:09<1:05:17, 2.35s/it] Training...: 36% 941/2608 [57:11<1:01:59, 2.23s/it] Training...: 36% 942/2608 [57:12<58:34, 2.11s/it]  Training...: 36% 943/2608 [57:14<55:19, 1.99s/it] Training...: 36% 944/2608 [57:16<52:33, 1.90s/it] Training...: 36% 945/2608 [57:17<49:29, 1.79s/it] Training...: 36% 946/2608 [57:19<46:03, 1.66s/it] Training...: 36% 947/2608 [57:20<42:23, 1.53s/it] Training...: 36% 948/2608 [57:21<38:28, 1.39s/it] Training...: 36% 949/2608 [57:22<34:31, 1.25s/it] Training...: 36% 950/2608 [57:23<29:19, 1.06s/it] Training...: 36% 951/2608 [57:30<1:26:05, 3.12s/it] Training...: 37% 952/2608 [57:38<2:01:53, 4.42s/it] Training...: 37% 953/2608 [57:45<2:23:15, 5.19s/it] Training...: 37% 954/2608 [57:52<2:36:05, 5.66s/it] Training...: 37% 955/2608 [57:58<2:42:48, 5.91s/it] Training...: 37% 956/2608 [58:04<2:43:45, 5.95s/it] Training...: 37% 957/2608 [58:10<2:44:15, 5.97s/it] Training...: 37% 958/2608 [58:16<2:41:14, 5.86s/it] Training...: 37% 959/2608 [58:21<2:39:03, 5.79s/it] Training...: 37% 960/2608 [58:27<2:34:50, 5.64s/it] Training...: 37% 961/2608 [58:32<2:31:19, 5.51s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:58:04<23:39:35, 9464.00s/it] Training...: 37% 961/2608 [58:38<2:31:19, 5.51s/it] Training...: 37% 962/2608 [58:38<2:32:35, 5.56s/it] Training...: 37% 963/2608 [58:42<2:26:13, 5.33s/it] Training...: 37% 964/2608 [58:47<2:20:09, 5.12s/it] Training...: 37% 965/2608 [58:52<2:15:15, 4.94s/it] Training...: 37% 966/2608 [58:56<2:10:54, 4.78s/it] Training...: 37% 967/2608 [59:00<2:06:58, 4.64s/it] Training...: 37% 968/2608 [59:04<2:02:32, 4.48s/it] Training...: 37% 969/2608 [59:09<1:59:35, 4.38s/it] Training...: 37% 970/2608 [59:12<1:55:27, 4.23s/it] Training...: 37% 971/2608 [59:16<1:51:31, 4.09s/it] Training...: 37% 972/2608 [59:20<1:47:29, 3.94s/it] Training...: 37% 973/2608 [59:23<1:44:14, 3.83s/it] Training...: 37% 974/2608 [59:27<1:40:54, 3.71s/it] Training...: 37% 975/2608 [59:30<1:38:00, 3.60s/it] Training...: 37% 976/2608 [59:33<1:34:56, 3.49s/it] Training...: 37% 977/2608 [59:37<1:32:35, 3.41s/it] Training...: 38% 978/2608 [59:40<1:30:08, 3.32s/it] Training...: 38% 979/2608 [59:43<1:27:42, 3.23s/it] Training...: 38% 980/2608 [59:46<1:25:23, 3.15s/it] Training...: 38% 981/2608 [59:49<1:23:15, 3.07s/it] Training...: 38% 982/2608 [59:51<1:20:54, 2.99s/it] Training...: 38% 983/2608 [59:54<1:18:25, 2.90s/it] Training...: 38% 984/2608 [59:57<1:15:18, 2.78s/it] Training...: 38% 985/2608 [59:59<1:12:39, 2.69s/it] Training...: 38% 986/2608 [1:00:01<1:09:41, 2.58s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [29:59:31<23:39:35, 9464.00s/it] Training...: 38% 986/2608 [1:00:04<1:09:41, 2.58s/it] Training...: 38% 987/2608 [1:00:04<1:10:49, 2.62s/it] Training...: 38% 988/2608 [1:00:06<1:06:40, 2.47s/it] Training...: 38% 989/2608 [1:00:08<1:03:13, 2.34s/it] Training...: 38% 990/2608 [1:00:10<1:00:04, 2.23s/it] Training...: 38% 991/2608 [1:00:12<57:01, 2.12s/it]  Training...: 38% 992/2608 [1:00:14<54:27, 2.02s/it] Training...: 38% 993/2608 [1:00:16<51:55, 1.93s/it] Training...: 38% 994/2608 [1:00:17<48:54, 1.82s/it] Training...: 38% 995/2608 [1:00:19<45:34, 1.70s/it] Training...: 38% 996/2608 [1:00:20<42:01, 1.56s/it] Training...: 38% 997/2608 [1:00:21<38:42, 1.44s/it] Training...: 38% 998/2608 [1:00:22<35:10, 1.31s/it] Training...: 38% 999/2608 [1:00:23<30:56, 1.15s/it] Training...: 38% 1000/2608 [1:00:23<26:00, 1.03it/s] Training...: 38% 1001/2608 [1:00:31<1:18:40, 2.94s/it] Training...: 38% 1002/2608 [1:00:38<1:52:36, 4.21s/it] Training...: 38% 1003/2608 [1:00:45<2:14:26, 5.03s/it] Training...: 38% 1004/2608 [1:00:51<2:25:09, 5.43s/it] Training...: 39% 1005/2608 [1:00:58<2:31:54, 5.69s/it] Training...: 39% 1006/2608 [1:01:04<2:34:07, 5.77s/it] Training...: 39% 1007/2608 [1:01:09<2:34:31, 5.79s/it] Training...: 39% 1008/2608 [1:01:15<2:31:50, 5.69s/it] Training...: 39% 1009/2608 [1:01:20<2:29:29, 5.61s/it] Training...: 39% 1010/2608 [1:01:25<2:25:20, 5.46s/it] Training...: 39% 1011/2608 [1:01:30<2:22:10, 5.34s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:01:03<23:39:35, 9464.00s/it] Training...: 39% 1011/2608 [1:01:36<2:22:10, 5.34s/it] Training...: 39% 1012/2608 [1:01:36<2:23:51, 5.41s/it] Training...: 39% 1013/2608 [1:01:41<2:18:25, 5.21s/it] Training...: 39% 1014/2608 [1:01:45<2:14:00, 5.04s/it] Training...: 39% 1015/2608 [1:01:50<2:09:23, 4.87s/it] Training...: 39% 1016/2608 [1:01:54<2:05:05, 4.71s/it] Training...: 39% 1017/2608 [1:01:58<2:01:38, 4.59s/it] Training...: 39% 1018/2608 [1:02:03<1:57:51, 4.45s/it] Training...: 39% 1019/2608 [1:02:07<1:55:04, 4.34s/it] Training...: 39% 1020/2608 [1:02:11<1:51:49, 4.23s/it] Training...: 39% 1021/2608 [1:02:15<1:48:41, 4.11s/it] Training...: 39% 1022/2608 [1:02:18<1:45:51, 4.00s/it] Training...: 39% 1023/2608 [1:02:22<1:43:08, 3.90s/it] Training...: 39% 1024/2608 [1:02:26<1:40:36, 3.81s/it] Training...: 39% 1025/2608 [1:02:29<1:37:43, 3.70s/it] Training...: 39% 1026/2608 [1:02:32<1:34:44, 3.59s/it] Training...: 39% 1027/2608 [1:02:36<1:31:59, 3.49s/it] Training...: 39% 1028/2608 [1:02:39<1:29:03, 3.38s/it] Training...: 39% 1029/2608 [1:02:42<1:26:37, 3.29s/it] Training...: 39% 1030/2608 [1:02:45<1:24:03, 3.20s/it] Training...: 40% 1031/2608 [1:02:48<1:21:30, 3.10s/it] Training...: 40% 1032/2608 [1:02:50<1:19:02, 3.01s/it] Training...: 40% 1033/2608 [1:02:53<1:16:35, 2.92s/it] Training...: 40% 1034/2608 [1:02:56<1:14:00, 2.82s/it] Training...: 40% 1035/2608 [1:02:58<1:11:37, 2.73s/it] Training...: 40% 1036/2608 [1:03:01<1:09:03, 2.64s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:02:30<23:39:35, 9464.00s/it] Training...: 40% 1036/2608 [1:03:03<1:09:03, 2.64s/it] Training...: 40% 1037/2608 [1:03:03<1:09:59, 2.67s/it] Training...: 40% 1038/2608 [1:03:06<1:06:36, 2.55s/it] Training...: 40% 1039/2608 [1:03:08<1:03:20, 2.42s/it] Training...: 40% 1040/2608 [1:03:10<1:00:31, 2.32s/it] Training...: 40% 1041/2608 [1:03:12<57:52, 2.22s/it]  Training...: 40% 1042/2608 [1:03:14<54:52, 2.10s/it] Training...: 40% 1043/2608 [1:03:15<51:53, 1.99s/it] Training...: 40% 1044/2608 [1:03:17<49:04, 1.88s/it] Training...: 40% 1045/2608 [1:03:19<45:53, 1.76s/it] Training...: 40% 1046/2608 [1:03:20<42:13, 1.62s/it] Training...: 40% 1047/2608 [1:03:21<38:24, 1.48s/it] Training...: 40% 1048/2608 [1:03:22<34:27, 1.33s/it] Training...: 40% 1049/2608 [1:03:23<30:22, 1.17s/it] Training...: 40% 1050/2608 [1:03:23<25:39, 1.01it/s] Training...: 40% 1051/2608 [1:03:31<1:16:51, 2.96s/it] Training...: 40% 1052/2608 [1:03:38<1:50:19, 4.25s/it] Training...: 40% 1053/2608 [1:03:45<2:10:56, 5.05s/it] Training...: 40% 1054/2608 [1:03:52<2:22:13, 5.49s/it] Training...: 40% 1055/2608 [1:03:58<2:28:31, 5.74s/it] Training...: 40% 1056/2608 [1:04:04<2:29:56, 5.80s/it] Training...: 41% 1057/2608 [1:04:10<2:30:07, 5.81s/it] Training...: 41% 1058/2608 [1:04:15<2:27:50, 5.72s/it] Training...: 41% 1059/2608 [1:04:21<2:26:21, 5.67s/it] Training...: 41% 1060/2608 [1:04:26<2:22:58, 5.54s/it] Training...: 41% 1061/2608 [1:04:31<2:20:08, 5.44s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:04:03<23:39:35, 9464.00s/it] Training...: 41% 1061/2608 [1:04:37<2:20:08, 5.44s/it] Training...: 41% 1062/2608 [1:04:37<2:21:07, 5.48s/it] Training...: 41% 1063/2608 [1:04:42<2:16:08, 5.29s/it] Training...: 41% 1064/2608 [1:04:46<2:11:10, 5.10s/it] Training...: 41% 1065/2608 [1:04:51<2:06:58, 4.94s/it] Training...: 41% 1066/2608 [1:04:55<2:02:29, 4.77s/it] Training...: 41% 1067/2608 [1:05:00<1:59:09, 4.64s/it] Training...: 41% 1068/2608 [1:05:04<1:56:02, 4.52s/it] Training...: 41% 1069/2608 [1:05:08<1:53:01, 4.41s/it] Training...: 41% 1070/2608 [1:05:12<1:49:28, 4.27s/it] Training...: 41% 1071/2608 [1:05:16<1:46:16, 4.15s/it] Training...: 41% 1072/2608 [1:05:20<1:43:33, 4.05s/it] Training...: 41% 1073/2608 [1:05:23<1:40:51, 3.94s/it] Training...: 41% 1074/2608 [1:05:27<1:37:23, 3.81s/it] Training...: 41% 1075/2608 [1:05:30<1:34:31, 3.70s/it] Training...: 41% 1076/2608 [1:05:33<1:31:31, 3.58s/it] Training...: 41% 1077/2608 [1:05:37<1:28:51, 3.48s/it] Training...: 41% 1078/2608 [1:05:40<1:26:12, 3.38s/it] Training...: 41% 1079/2608 [1:05:43<1:23:56, 3.29s/it] Training...: 41% 1080/2608 [1:05:46<1:21:28, 3.20s/it] Training...: 41% 1081/2608 [1:05:49<1:19:15, 3.11s/it] Training...: 41% 1082/2608 [1:05:52<1:16:51, 3.02s/it]wandb: Network error (ReadTimeout), entering retry loop. Training...: 42% 1083/2608 [1:05:54<1:14:31, 2.93s/it] Training...: 42% 1084/2608 [1:05:57<1:11:54, 2.83s/it] Training...: 42% 1085/2608 [1:05:59<1:09:32, 2.74s/it] Training...: 42% 1086/2608 [1:06:02<1:06:58, 2.64s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:05:31<23:39:35, 9464.00s/it] Training...: 42% 1086/2608 [1:06:05<1:06:58, 2.64s/it] Training...: 42% 1087/2608 [1:06:05<1:07:31, 2.66s/it] Training...: 42% 1088/2608 [1:06:07<1:03:40, 2.51s/it] Training...: 42% 1089/2608 [1:06:09<1:00:34, 2.39s/it] Training...: 42% 1090/2608 [1:06:11<57:39, 2.28s/it]  Training...: 42% 1091/2608 [1:06:13<54:50, 2.17s/it] Training...: 42% 1092/2608 [1:06:15<52:08, 2.06s/it] Training...: 42% 1093/2608 [1:06:16<49:34, 1.96s/it] Training...: 42% 1094/2608 [1:06:18<47:13, 1.87s/it] Training...: 42% 1095/2608 [1:06:19<44:10, 1.75s/it] Training...: 42% 1096/2608 [1:06:21<40:53, 1.62s/it] Training...: 42% 1097/2608 [1:06:22<37:38, 1.49s/it] Training...: 42% 1098/2608 [1:06:23<33:52, 1.35s/it] Training...: 42% 1099/2608 [1:06:24<29:58, 1.19s/it] Training...: 42% 1100/2608 [1:06:24<25:08, 1.00s/it] Training...: 42% 1101/2608 [1:06:32<1:14:51, 2.98s/it] Training...: 42% 1102/2608 [1:06:39<1:46:39, 4.25s/it] Training...: 42% 1103/2608 [1:06:46<2:06:38, 5.05s/it] Training...: 42% 1104/2608 [1:06:53<2:17:18, 5.48s/it] Training...: 42% 1105/2608 [1:06:59<2:24:13, 5.76s/it] Training...: 42% 1106/2608 [1:07:05<2:25:41, 5.82s/it] Training...: 42% 1107/2608 [1:07:11<2:25:59, 5.84s/it] Training...: 42% 1108/2608 [1:07:17<2:24:55, 5.80s/it] Training...: 43% 1109/2608 [1:07:22<2:23:46, 5.76s/it] Training...: 43% 1110/2608 [1:07:27<2:19:42, 5.60s/it] Training...: 43% 1111/2608 [1:07:33<2:16:28, 5.47s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:07:05<23:39:35, 9464.00s/it] Training...: 43% 1111/2608 [1:07:38<2:16:28, 5.47s/it] Training...: 43% 1112/2608 [1:07:38<2:17:49, 5.53s/it] Training...: 43% 1113/2608 [1:07:43<2:12:23, 5.31s/it] Training...: 43% 1114/2608 [1:07:48<2:07:53, 5.14s/it] Training...: 43% 1115/2608 [1:07:52<2:03:57, 4.98s/it] Training...: 43% 1116/2608 [1:07:57<2:00:27, 4.84s/it] Training...: 43% 1117/2608 [1:08:01<1:57:59, 4.75s/it] Training...: 43% 1118/2608 [1:08:06<1:54:14, 4.60s/it] Training...: 43% 1119/2608 [1:08:10<1:50:42, 4.46s/it] Training...: 43% 1120/2608 [1:08:14<1:47:39, 4.34s/it] Training...: 43% 1121/2608 [1:08:18<1:44:55, 4.23s/it] Training...: 43% 1122/2608 [1:08:22<1:41:49, 4.11s/it] Training...: 43% 1123/2608 [1:08:25<1:39:04, 4.00s/it] Training...: 43% 1124/2608 [1:08:29<1:37:06, 3.93s/it] Training...: 43% 1125/2608 [1:08:33<1:35:00, 3.84s/it] Training...: 43% 1126/2608 [1:08:36<1:31:47, 3.72s/it] Training...: 43% 1127/2608 [1:08:40<1:28:38, 3.59s/it] Training...: 43% 1128/2608 [1:08:43<1:26:00, 3.49s/it] Training...: 43% 1129/2608 [1:08:46<1:23:07, 3.37s/it] Training...: 43% 1130/2608 [1:08:49<1:20:26, 3.27s/it] Training...: 43% 1131/2608 [1:08:52<1:17:29, 3.15s/it] Training...: 43% 1132/2608 [1:08:55<1:14:49, 3.04s/it] Training...: 43% 1133/2608 [1:08:57<1:12:26, 2.95s/it] Training...: 43% 1134/2608 [1:09:00<1:10:03, 2.85s/it] Training...: 44% 1135/2608 [1:09:03<1:08:02, 2.77s/it] Training...: 44% 1136/2608 [1:09:05<1:06:10, 2.70s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:08:35<23:39:35, 9464.00s/it] Training...: 44% 1136/2608 [1:09:08<1:06:10, 2.70s/it] Training...: 44% 1137/2608 [1:09:08<1:06:54, 2.73s/it] Training...: 44% 1138/2608 [1:09:10<1:03:30, 2.59s/it] Training...: 44% 1139/2608 [1:09:12<1:00:23, 2.47s/it] Training...: 44% 1140/2608 [1:09:14<57:32, 2.35s/it]  Training...: 44% 1141/2608 [1:09:16<54:52, 2.24s/it] Training...: 44% 1142/2608 [1:09:18<52:03, 2.13s/it] Training...: 44% 1143/2608 [1:09:20<49:21, 2.02s/it] Training...: 44% 1144/2608 [1:09:22<46:29, 1.91s/it] Training...: 44% 1145/2608 [1:09:23<43:30, 1.78s/it] Training...: 44% 1146/2608 [1:09:25<40:34, 1.67s/it] Training...: 44% 1147/2608 [1:09:26<37:40, 1.55s/it] Training...: 44% 1148/2608 [1:09:27<34:04, 1.40s/it] Training...: 44% 1149/2608 [1:09:28<29:53, 1.23s/it] Training...: 44% 1150/2608 [1:09:28<25:06, 1.03s/it] Training...: 44% 1151/2608 [1:09:36<1:13:04, 3.01s/it] Training...: 44% 1152/2608 [1:09:43<1:44:28, 4.31s/it] Training...: 44% 1153/2608 [1:09:50<2:03:24, 5.09s/it] Training...: 44% 1154/2608 [1:09:57<2:13:21, 5.50s/it] Training...: 44% 1155/2608 [1:10:03<2:18:44, 5.73s/it] Training...: 44% 1156/2608 [1:10:09<2:19:34, 5.77s/it] Training...: 44% 1157/2608 [1:10:15<2:19:33, 5.77s/it] Training...: 44% 1158/2608 [1:10:20<2:17:46, 5.70s/it] Training...: 44% 1159/2608 [1:10:26<2:16:09, 5.64s/it] Training...: 44% 1160/2608 [1:10:31<2:12:58, 5.51s/it] Training...: 45% 1161/2608 [1:10:36<2:09:58, 5.39s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:10:08<23:39:35, 9464.00s/it] Training...: 45% 1161/2608 [1:10:42<2:09:58, 5.39s/it] Training...: 45% 1162/2608 [1:10:42<2:11:47, 5.47s/it] Training...: 45% 1163/2608 [1:10:46<2:07:13, 5.28s/it] Training...: 45% 1164/2608 [1:10:51<2:02:13, 5.08s/it] Training...: 45% 1165/2608 [1:10:56<1:57:58, 4.91s/it] Training...: 45% 1166/2608 [1:11:00<1:53:29, 4.72s/it] Training...: 45% 1167/2608 [1:11:04<1:50:27, 4.60s/it] Training...: 45% 1168/2608 [1:11:08<1:47:14, 4.47s/it] Training...: 45% 1169/2608 [1:11:12<1:44:32, 4.36s/it] Training...: 45% 1170/2608 [1:11:16<1:41:24, 4.23s/it] Training...: 45% 1171/2608 [1:11:20<1:38:47, 4.13s/it] Training...: 45% 1172/2608 [1:11:24<1:36:01, 4.01s/it] Training...: 45% 1173/2608 [1:11:28<1:33:34, 3.91s/it] Training...: 45% 1174/2608 [1:11:31<1:30:58, 3.81s/it] Training...: 45% 1175/2608 [1:11:35<1:28:46, 3.72s/it] Training...: 45% 1176/2608 [1:11:38<1:26:13, 3.61s/it] Training...: 45% 1177/2608 [1:11:41<1:23:45, 3.51s/it] Training...: 45% 1178/2608 [1:11:45<1:21:17, 3.41s/it] Training...: 45% 1179/2608 [1:11:48<1:19:09, 3.32s/it] Training...: 45% 1180/2608 [1:11:51<1:16:49, 3.23s/it] Training...: 45% 1181/2608 [1:11:54<1:14:36, 3.14s/it] Training...: 45% 1182/2608 [1:11:56<1:12:06, 3.03s/it] Training...: 45% 1183/2608 [1:11:59<1:09:51, 2.94s/it] Training...: 45% 1184/2608 [1:12:02<1:07:40, 2.85s/it] Training...: 45% 1185/2608 [1:12:04<1:05:03, 2.74s/it] Training...: 45% 1186/2608 [1:12:07<1:02:37, 2.64s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:11:36<23:39:35, 9464.00s/it] Training...: 45% 1186/2608 [1:12:09<1:02:37, 2.64s/it] Training...: 46% 1187/2608 [1:12:09<1:03:18, 2.67s/it] Training...: 46% 1188/2608 [1:12:12<59:51, 2.53s/it]  Training...: 46% 1189/2608 [1:12:14<56:47, 2.40s/it] Training...: 46% 1190/2608 [1:12:16<53:46, 2.28s/it] Training...: 46% 1191/2608 [1:12:18<51:07, 2.16s/it] Training...: 46% 1192/2608 [1:12:19<48:22, 2.05s/it] Training...: 46% 1193/2608 [1:12:21<45:38, 1.94s/it] Training...: 46% 1194/2608 [1:12:23<43:01, 1.83s/it] Training...: 46% 1195/2608 [1:12:24<40:31, 1.72s/it] Training...: 46% 1196/2608 [1:12:25<37:43, 1.60s/it] Training...: 46% 1197/2608 [1:12:27<34:48, 1.48s/it] Training...: 46% 1198/2608 [1:12:28<31:33, 1.34s/it] Training...: 46% 1199/2608 [1:12:28<28:07, 1.20s/it] Training...: 46% 1200/2608 [1:12:29<23:51, 1.02s/it] Training...: 46% 1201/2608 [1:12:37<1:09:38, 2.97s/it] Training...: 46% 1202/2608 [1:12:44<1:39:07, 4.23s/it] Training...: 46% 1203/2608 [1:12:51<1:58:02, 5.04s/it] Training...: 46% 1204/2608 [1:12:57<2:08:13, 5.48s/it] Training...: 46% 1205/2608 [1:13:03<2:14:00, 5.73s/it] Training...: 46% 1206/2608 [1:13:09<2:15:48, 5.81s/it] Training...: 46% 1207/2608 [1:13:15<2:15:59, 5.82s/it] Training...: 46% 1208/2608 [1:13:21<2:13:51, 5.74s/it] Training...: 46% 1209/2608 [1:13:26<2:12:00, 5.66s/it] Training...: 46% 1210/2608 [1:13:31<2:08:12, 5.50s/it] Training...: 46% 1211/2608 [1:13:37<2:05:24, 5.39s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:13:09<23:39:35, 9464.00s/it] Training...: 46% 1211/2608 [1:13:42<2:05:24, 5.39s/it] Training...: 46% 1212/2608 [1:13:42<2:06:54, 5.45s/it] Training...: 47% 1213/2608 [1:13:47<2:02:13, 5.26s/it] Training...: 47% 1214/2608 [1:13:52<1:58:26, 5.10s/it] Training...: 47% 1215/2608 [1:13:56<1:55:12, 4.96s/it] Training...: 47% 1216/2608 [1:14:01<1:51:06, 4.79s/it] Training...: 47% 1217/2608 [1:14:05<1:48:55, 4.70s/it] Training...: 47% 1218/2608 [1:14:10<1:46:28, 4.60s/it] Training...: 47% 1219/2608 [1:14:14<1:43:30, 4.47s/it] Training...: 47% 1220/2608 [1:14:18<1:40:34, 4.35s/it] Training...: 47% 1221/2608 [1:14:22<1:38:19, 4.25s/it] Training...: 47% 1222/2608 [1:14:26<1:35:07, 4.12s/it] Training...: 47% 1223/2608 [1:14:29<1:32:01, 3.99s/it] Training...: 47% 1224/2608 [1:14:33<1:29:12, 3.87s/it] Training...: 47% 1225/2608 [1:14:36<1:26:42, 3.76s/it] Training...: 47% 1226/2608 [1:14:40<1:24:05, 3.65s/it] Training...: 47% 1227/2608 [1:14:43<1:21:59, 3.56s/it] Training...: 47% 1228/2608 [1:14:46<1:19:37, 3.46s/it] Training...: 47% 1229/2608 [1:14:50<1:17:19, 3.36s/it] Training...: 47% 1230/2608 [1:14:53<1:15:05, 3.27s/it] Training...: 47% 1231/2608 [1:14:56<1:12:58, 3.18s/it] Training...: 47% 1232/2608 [1:14:58<1:10:41, 3.08s/it] Training...: 47% 1233/2608 [1:15:01<1:08:30, 2.99s/it] Training...: 47% 1234/2608 [1:15:04<1:06:04, 2.89s/it] Training...: 47% 1235/2608 [1:15:06<1:03:52, 2.79s/it] Training...: 47% 1236/2608 [1:15:09<1:01:47, 2.70s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:14:38<23:39:35, 9464.00s/it] Training...: 47% 1236/2608 [1:15:12<1:01:47, 2.70s/it] Training...: 47% 1237/2608 [1:15:12<1:02:16, 2.73s/it] Training...: 47% 1238/2608 [1:15:14<59:15, 2.60s/it]  Training...: 48% 1239/2608 [1:15:16<56:14, 2.46s/it] Training...: 48% 1240/2608 [1:15:18<53:25, 2.34s/it] Training...: 48% 1241/2608 [1:15:20<51:00, 2.24s/it] Training...: 48% 1242/2608 [1:15:22<48:26, 2.13s/it] Training...: 48% 1243/2608 [1:15:24<46:01, 2.02s/it] Training...: 48% 1244/2608 [1:15:26<43:25, 1.91s/it] Training...: 48% 1245/2608 [1:15:27<40:49, 1.80s/it] Training...: 48% 1246/2608 [1:15:28<37:59, 1.67s/it] Training...: 48% 1247/2608 [1:15:30<35:07, 1.55s/it] Training...: 48% 1248/2608 [1:15:31<31:57, 1.41s/it] Training...: 48% 1249/2608 [1:15:32<28:19, 1.25s/it] Training...: 48% 1250/2608 [1:15:32<23:42, 1.05s/it] Training...: 48% 1251/2608 [1:15:40<1:07:44, 3.00s/it] Training...: 48% 1252/2608 [1:15:47<1:36:52, 4.29s/it] Training...: 48% 1253/2608 [1:15:54<1:55:37, 5.12s/it] Training...: 48% 1254/2608 [1:16:01<2:05:31, 5.56s/it] Training...: 48% 1255/2608 [1:16:07<2:10:59, 5.81s/it] Training...: 48% 1256/2608 [1:16:13<2:12:44, 5.89s/it] Training...: 48% 1257/2608 [1:16:19<2:13:15, 5.92s/it] Training...: 48% 1258/2608 [1:16:25<2:11:21, 5.84s/it] Training...: 48% 1259/2608 [1:16:30<2:09:12, 5.75s/it] Training...: 48% 1260/2608 [1:16:36<2:06:04, 5.61s/it] Training...: 48% 1261/2608 [1:16:41<2:03:15, 5.49s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:16:13<23:39:35, 9464.00s/it] Training...: 48% 1261/2608 [1:16:47<2:03:15, 5.49s/it] Training...: 48% 1262/2608 [1:16:47<2:04:22, 5.54s/it] Training...: 48% 1263/2608 [1:16:51<1:59:52, 5.35s/it] Training...: 48% 1264/2608 [1:16:56<1:55:15, 5.15s/it] Training...: 49% 1265/2608 [1:17:01<1:51:23, 4.98s/it] Training...: 49% 1266/2608 [1:17:05<1:47:30, 4.81s/it] Training...: 49% 1267/2608 [1:17:09<1:44:18, 4.67s/it] Training...: 49% 1268/2608 [1:17:14<1:41:10, 4.53s/it] Training...: 49% 1269/2608 [1:17:18<1:37:59, 4.39s/it] Training...: 49% 1270/2608 [1:17:22<1:34:37, 4.24s/it] Training...: 49% 1271/2608 [1:17:25<1:31:59, 4.13s/it] Training...: 49% 1272/2608 [1:17:29<1:29:18, 4.01s/it] Training...: 49% 1273/2608 [1:17:33<1:27:13, 3.92s/it] Training...: 49% 1274/2608 [1:17:37<1:25:33, 3.85s/it] Training...: 49% 1275/2608 [1:17:40<1:23:31, 3.76s/it] Training...: 49% 1276/2608 [1:17:44<1:21:15, 3.66s/it] Training...: 49% 1277/2608 [1:17:47<1:19:10, 3.57s/it] Training...: 49% 1278/2608 [1:17:50<1:16:47, 3.46s/it] Training...: 49% 1279/2608 [1:17:53<1:14:55, 3.38s/it] Training...: 49% 1280/2608 [1:17:56<1:12:40, 3.28s/it] Training...: 49% 1281/2608 [1:17:59<1:10:16, 3.18s/it] Training...: 49% 1282/2608 [1:18:02<1:07:35, 3.06s/it] Training...: 49% 1283/2608 [1:18:05<1:05:23, 2.96s/it] Training...: 49% 1284/2608 [1:18:07<1:03:05, 2.86s/it] Training...: 49% 1285/2608 [1:18:10<1:00:56, 2.76s/it] Training...: 49% 1286/2608 [1:18:13<59:05, 2.68s/it]   Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:17:42<23:39:35, 9464.00s/it] Training...: 49% 1286/2608 [1:18:15<59:05, 2.68s/it] Training...: 49% 1287/2608 [1:18:15<59:37, 2.71s/it] Training...: 49% 1288/2608 [1:18:17<56:18, 2.56s/it] Training...: 49% 1289/2608 [1:18:20<53:40, 2.44s/it] Training...: 49% 1290/2608 [1:18:22<50:51, 2.32s/it] Training...: 50% 1291/2608 [1:18:24<48:23, 2.20s/it] Training...: 50% 1292/2608 [1:18:25<46:02, 2.10s/it] Training...: 50% 1293/2608 [1:18:27<43:37, 1.99s/it] Training...: 50% 1294/2608 [1:18:29<41:07, 1.88s/it] Training...: 50% 1295/2608 [1:18:30<38:27, 1.76s/it] Training...: 50% 1296/2608 [1:18:32<35:34, 1.63s/it] Training...: 50% 1297/2608 [1:18:33<32:50, 1.50s/it] Training...: 50% 1298/2608 [1:18:34<29:44, 1.36s/it] Training...: 50% 1299/2608 [1:18:35<26:36, 1.22s/it] Training...: 50% 1300/2608 [1:18:35<22:33, 1.03s/it] Training...: 50% 1301/2608 [1:18:43<1:05:30, 3.01s/it] Training...: 50% 1302/2608 [1:18:50<1:33:12, 4.28s/it] Training...: 50% 1303/2608 [1:18:57<1:50:34, 5.08s/it] Training...: 50% 1304/2608 [1:19:04<2:00:08, 5.53s/it] Training...: 50% 1305/2608 [1:19:10<2:05:46, 5.79s/it] Training...: 50% 1306/2608 [1:19:16<2:07:42, 5.89s/it] Training...: 50% 1307/2608 [1:19:22<2:07:51, 5.90s/it] Training...: 50% 1308/2608 [1:19:28<2:05:24, 5.79s/it] Training...: 50% 1309/2608 [1:19:33<2:03:13, 5.69s/it] Training...: 50% 1310/2608 [1:19:38<2:00:20, 5.56s/it] Training...: 50% 1311/2608 [1:19:44<1:57:50, 5.45s/it]  Epoch... (8/20 | Eval Loss: 0.5150057673454285 | Eval wer: 0.1716611172770995 |): 55% 11/20 [30:19:16<23:39:35, 9464.00s/it] Training...: 50% 1311/2608 [1:19:49<1:57:50, 5.45s/it]Step... (28700 | Loss: 0.01324186660349369, Learning Rate: 4.3032323446823284e-05, Gradient Norm: 0.3575594425201416) Step... (28725 | Loss: 0.005364256910979748, Learning Rate: 4.298182102502324e-05, Gradient Norm: 0.41470810770988464) Step... (28750 | Loss: 0.017489353194832802, Learning Rate: 4.2931311327265576e-05, Gradient Norm: 0.46892568469047546) Step... (28775 | Loss: 0.005093330051749945, Learning Rate: 4.288080890546553e-05, Gradient Norm: 0.22218424081802368) Step... (28800 | Loss: 0.019066592678427696, Learning Rate: 4.283030648366548e-05, Gradient Norm: 0.3883430063724518) Step... (28825 | Loss: 0.00597224710509181, Learning Rate: 4.277979678590782e-05, Gradient Norm: 0.31905898451805115) Step... (28850 | Loss: 0.013649072498083115, Learning Rate: 4.272929436410777e-05, Gradient Norm: 0.3545985519886017) Step... (28875 | Loss: 0.014111006632447243, Learning Rate: 4.267878466635011e-05, Gradient Norm: 0.6679030060768127) Step... (28900 | Loss: 0.018808016553521156, Learning Rate: 4.2628282244550064e-05, Gradient Norm: 0.48444387316703796) Step... (28925 | Loss: 0.010545330122113228, Learning Rate: 4.257777982275002e-05, Gradient Norm: 0.43524983525276184) Step... (28950 | Loss: 0.025198884308338165, Learning Rate: 4.2527270124992356e-05, Gradient Norm: 0.7812223434448242) Step... (28975 | Loss: 0.006328547839075327, Learning Rate: 4.247676770319231e-05, Gradient Norm: 0.3543586730957031) Step... (29000 | Loss: 0.023827416822314262, Learning Rate: 4.2426261643413454e-05, Gradient Norm: 0.5372229814529419) Step... (29025 | Loss: 0.01068464107811451, Learning Rate: 4.23757555836346e-05, Gradient Norm: 0.28735247254371643) Step... (29050 | Loss: 0.01814340427517891, Learning Rate: 4.2325249523855746e-05, Gradient Norm: 0.44269007444381714) Step... (29075 | Loss: 0.0040423269383609295, Learning Rate: 4.22747471020557e-05, Gradient Norm: 0.2711266875267029) Step... (29100 | Loss: 0.013016236014664173, Learning Rate: 4.2224241042276844e-05, Gradient Norm: 0.2726873457431793) Step... (29125 | Loss: 0.01034215372055769, Learning Rate: 4.217373498249799e-05, Gradient Norm: 0.41604796051979065) Step... (29150 | Loss: 0.02490173652768135, Learning Rate: 4.212323256069794e-05, Gradient Norm: 0.5067036747932434) Step... (29175 | Loss: 0.00537874223664403, Learning Rate: 4.207272286294028e-05, Gradient Norm: 0.24741435050964355) Step... (29200 | Loss: 0.024084780365228653, Learning Rate: 4.2022220441140234e-05, Gradient Norm: 0.5818319320678711) Step... (29225 | Loss: 0.005320483818650246, Learning Rate: 4.197171801934019e-05, Gradient Norm: 0.37143582105636597) Step... (29250 | Loss: 0.012396669015288353, Learning Rate: 4.1921208321582526e-05, Gradient Norm: 0.30165690183639526) Step... (29275 | Loss: 0.012446265667676926, Learning Rate: 4.187070589978248e-05, Gradient Norm: 0.4551958441734314) Step... (29300 | Loss: 0.010365981608629227, Learning Rate: 4.182020347798243e-05, Gradient Norm: 0.3267727494239807) Step... (29325 | Loss: 0.008926907554268837, Learning Rate: 4.176969378022477e-05, Gradient Norm: 0.5192976593971252) Step... (29350 | Loss: 0.011660218238830566, Learning Rate: 4.171919135842472e-05, Gradient Norm: 0.33286282420158386) Step... (29375 | Loss: 0.007073409855365753, Learning Rate: 4.1668688936624676e-05, Gradient Norm: 0.5280326008796692) Step... (29400 | Loss: 0.022164685651659966, Learning Rate: 4.1618179238867015e-05, Gradient Norm: 0.43605518341064453) Step... (29425 | Loss: 0.012267806567251682, Learning Rate: 4.156767681706697e-05, Gradient Norm: 0.4562574326992035) Step... (29450 | Loss: 0.015768524259328842, Learning Rate: 4.151717439526692e-05, Gradient Norm: 0.3778232932090759) Step... (29475 | Loss: 0.014861389994621277, Learning Rate: 4.146666469750926e-05, Gradient Norm: 0.5881226062774658) Step... (29500 | Loss: 0.027196036651730537, Learning Rate: 4.141616227570921e-05, Gradient Norm: 0.5360493659973145) Step... (29525 | Loss: 0.028068453073501587, Learning Rate: 4.1365659853909165e-05, Gradient Norm: 0.7262980341911316) Step... (29550 | Loss: 0.017910 18806397915, Learning Rate: 4.13151501561515e-05, Gradient Norm: 0.42247477173805237) Step... (29575 | Loss: 0.0018388490425422788, Learning Rate: 4.1264647734351456e-05, Gradient Norm: 0.13318009674549103) Step... (29600 | Loss: 0.017437970265746117, Learning Rate: 4.12141416745726e-05, Gradient Norm: 0.38107994198799133) Step... (29625 | Loss: 0.004518967587500811, Learning Rate: 4.116363561479375e-05, Gradient Norm: 0.2731289565563202) Step... (29650 | Loss: 0.019883807748556137, Learning Rate: 4.1113129555014893e-05, Gradient Norm: 0.40272489190101624) Step... (29675 | Loss: 0.00949388649314642, Learning Rate: 4.1062627133214846e-05, Gradient Norm: 0.44167739152908325) Step... (29700 | Loss: 0.023795781657099724, Learning Rate: 4.1012117435457185e-05, Gradient Norm: 0.4918016493320465) Step... (29725 | Loss: 0.009676703251898289, Learning Rate: 4.096161501365714e-05, Gradient Norm: 0.5256915092468262) Step... (29750 | Loss: 0.01882634498178959, Learning Rate: 4.091111259185709e-05, Gradient Norm: 0.7519373297691345) Step... (29775 | Loss: 0.010672434233129025, Learning Rate: 4.086060289409943e-05, Gradient Norm: 0.42842066287994385) Step... (29800 | Loss: 0.016137100756168365, Learning Rate: 4.081010047229938e-05, Gradient Norm: 0.39318785071372986) Step... (29825 | Loss: 0.013787748292088509, Learning Rate: 4.0759598050499335e-05, Gradient Norm: 0.8088423609733582) Step... (29850 | Loss: 0.016184644773602486, Learning Rate: 4.0709088352741674e-05, Gradient Norm: 0.4466698467731476) Step... (29875 | Loss: 0.007321678102016449, Learning Rate: 4.0658585930941626e-05, Gradient Norm: 0.32334280014038086) Step... (29900 | Loss: 0.01876140758395195, Learning Rate: 4.0608076233183965e-05, Gradient Norm: 0.4208255410194397) Step... (29925 | Loss: 0.02345845475792885, Learning Rate: 4.055757381138392e-05, Gradient Norm: 0.7127450108528137) Step... (29950 | Loss: 0.017886696383357048, Learning Rate: 4.050707138958387e-05, Gradient Norm: 0.33364027738571167) Step... (29975 | Loss: 0.021145498380064964, Learning Rate: 4.045656169182621e-05, Gradient Norm: 0.6187448501586914) Step... (30000 | Loss: 0.020344849675893784, Learning Rate: 4.040605927002616e-05, Gradient Norm: 0.4143028259277344) Evaluating ...: 0% 0/49 [00:00 main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' The push command with PID 1524345 failed. remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - The push command with PID 1524345 failed. 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' The push command with PID 1564583 failed. 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - The push command with PID 1564583 failed. remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' The push command with PID 1604028 failed. 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - The push command with PID 1604028 failed. remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) The push command with PID 1643861 failed. error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - The push command with PID 1643861 failed. remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' 05/12/2022 18:22:32 - ERROR - huggingface_hub.repository - remote: Enforcing permissions... remote: Allowed refs: all remote: ------------------------------------------------------------------------- remote: Your push was rejected because it contains binary files. remote: Please use https://git-lfs.github.com/ to store binary files. remote: See also: https://hf.co/docs/hub/adding-a-model#uploading-your-files remote: ------------------------------------------------------------------------- remote: Offending files: remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) remote: - wandb/run-20220510_091910-3qamzxaf/run-3qamzxaf.wandb (ref: refs/heads/main) To https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-voxpopuli-baseline' wandb: Waiting for W&B process to finish... (success). wandb: - 4.030 MB of 4.030 MB uploaded (0.000 MB deduped) wandb: \ 5.287 MB of 5.287 MB uploaded (0.000 MB deduped) wandb: | 5.287 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: / 5.287 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: - 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: \ 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: | 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: / 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: - 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: \ 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: | 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: / 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: - 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: \ 5.646 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: | 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: / 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: - 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: \ 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: | 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: / 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: - 5.647 MB of 5.647 MB uploaded (0.000 MB deduped) wandb: wandb: wandb: Run history: wandb: eval/loss ▁▃▆▇█ wandb: eval/wer ▁▇▆█▁ wandb: test/loss ▁ wandb: test/wer ▁ wandb: train/decoder_grad_norm ▆▄▂▂▂▂▂█▂▁▂▂▁▂▂▂▁▁▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: train/decoder_param_norm ▃▂▁▁▁▂▂▃▃▄▄▅▅▅▆▆▆▆▇▇▇▇▇▇████████████████ wandb: train/encoder_grad_norm ▇█▅▄▄▃▃█▃▃▄▃▂▃▃▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁ wandb: train/encoder_param_norm ▁▂▂▃▃▃▄▄▄▅▅▅▆▆▆▆▇▇▇▇▇▇▇█████████████████ wandb: train/grad_norm ▆▄▃▂▂▂▂█▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: train/learning_rate ▇███▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁ wandb: train/loss █▃▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: train/param_norm ▁▁▁▂▂▃▃▃▄▄▅▅▅▆▆▆▆▇▇▇▇▇▇▇████████████████ wandb: wandb: Run summary: wandb: eval/loss 0.73377 wandb: eval/wer 0.15111 wandb: test/loss 0.6892 wandb: test/wer 0.14375 wandb: train/decoder_grad_norm 0.00273 wandb: train/decoder_param_norm 1063.03467 wandb: train/encoder_grad_norm 0.002 wandb: train/encoder_param_norm 2323.10889 wandb: train/grad_norm 0.00338 wandb: train/learning_rate 0.0 wandb: train/loss 5e-05 wandb: train/param_norm 2554.77539 wandb: wandb: Synced flax-wav2vec2-2-bart-large-voxpopuli-baseline: https://wandb.ai/sanchit-gandhi/voxpopuli/runs/3qamzxaf wandb: Synced 5 W&B file(s), 8 media file(s), 8 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20220510_091910-3qamzxaf/logs