diff --git "a/wandb/run-20220325_193848-1sz5964i/files/output.log" "b/wandb/run-20220325_193848-1sz5964i/files/output.log" --- "a/wandb/run-20220325_193848-1sz5964i/files/output.log" +++ "b/wandb/run-20220325_193848-1sz5964i/files/output.log" @@ -12439,3 +12439,1342 @@ {'eval_loss': 0.36502909660339355, 'eval_wer': 0.11207854026180088, 'eval_runtime': 567.0865, 'eval_samples_per_second': 4.659, 'eval_steps_per_second': 0.584, 'epoch': 4.48} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1683, 'learning_rate': 5.756097560975609e-05, 'epoch': 4.49} +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1527, 'learning_rate': 5.707317073170731e-05, 'epoch': 4.49} +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1248, 'learning_rate': 5.6585365853658533e-05, 'epoch': 4.5} +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1299, 'learning_rate': 5.609756097560975e-05, 'epoch': 4.5} +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1563, 'learning_rate': 5.560975609756097e-05, 'epoch': 4.51} +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1239, 'learning_rate': 5.512195121951219e-05, 'epoch': 4.51} + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████���███████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1481, 'learning_rate': 5.4634146341463415e-05, 'epoch': 4.52} + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1316, 'learning_rate': 5.4146341463414625e-05, 'epoch': 4.52} + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████��███████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1313, 'learning_rate': 5.365853658536585e-05, 'epoch': 4.52} + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1185, 'learning_rate': 5.317073170731707e-05, 'epoch': 4.53} + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1288, 'learning_rate': 5.268292682926828e-05, 'epoch': 4.53} + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1273, 'learning_rate': 5.2195121951219506e-05, 'epoch': 4.54} + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1418, 'learning_rate': 5.170731707317073e-05, 'epoch': 4.54} + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████��███████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1133, 'learning_rate': 5.121951219512195e-05, 'epoch': 4.55} + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|████████████████████████████████████████████████████████████████████��█ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1178, 'learning_rate': 5.0731707317073163e-05, 'epoch': 4.55} + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.097, 'learning_rate': 5.024390243902439e-05, 'epoch': 4.56} + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|████████���█████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1185, 'learning_rate': 4.975609756097561e-05, 'epoch': 4.56} + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████��███████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 1018/1115 [6:39:02<40:03, 24.77s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▎ | 1018/1115 [6:39:02<40:03, 24.77s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1152, 'learning_rate': 4.8780487804878045e-05, 'epoch': 4.57} +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1227, 'learning_rate': 4.829268292682927e-05, 'epoch': 4.57} +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1017, 'learning_rate': 4.7804878048780485e-05, 'epoch': 4.58} +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0982, 'learning_rate': 4.73170731707317e-05, 'epoch': 4.58} +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1289, 'learning_rate': 4.6829268292682926e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0997, 'learning_rate': 4.634146341463414e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:19,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:19,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0879, 'learning_rate': 4.585365853658536e-05, 'epoch': 4.6} +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:20:54,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:20:54,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1062, 'learning_rate': 4.48780487804878e-05, 'epoch': 4.61} +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 1028/1115 [6:42:49<31:54, 22.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▉ | 1028/1115 [6:42:49<31:54, 22.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0964, 'learning_rate': 4.4390243902439024e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▉ | 1028/1115 [6:42:49<31:54, 22.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:53,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:53,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:53,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:59,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:21:59,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0981, 'learning_rate': 4.3902439024390234e-05, 'epoch': 4.61} +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:03,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:03,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:07,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:07,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:07,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:14,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:14,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:14,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 1030/1115 [6:43:30<29:52, 21.09s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████████▏ | 1030/1115 [6:43:30<29:52, 21.09s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.068, 'learning_rate': 4.341463414634146e-05, 'epoch': 4.62} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:23,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:23,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:27,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:27,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:27,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:38,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:38,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1051, 'learning_rate': 4.292682926829268e-05, 'epoch': 4.62} +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:50,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:52,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:22:52,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:22:56,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 1032/1115 [6:44:09<28:01, 20.26s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▎ | 1032/1115 [6:44:09<28:01, 20.26s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:00,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:00,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:04,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:07,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:07,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:10,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:12,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:15,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:15,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:23:15,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:19,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:21,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:23,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:25,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:27,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:29,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:31,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:31,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▍ | 1034/1115 [6:44:43<25:11, 18.66s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:35,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:37,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:39,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:41,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:43,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:44,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:46,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:46,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▍ | 1035/1115 [6:44:59<23:35, 17.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:50,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:52,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:56,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:57,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:23:59,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:01,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:01,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▌ | 1036/1115 [6:45:13<21:55, 16.65s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:06,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:07,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:09,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:11,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:12,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:12,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▌ | 1037/1115 [6:45:26<20:08, 15.50s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:17,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:20,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:20,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:23,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:25,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:25,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▋ | 1038/1115 [6:45:38<18:37, 14.52s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:29,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:31,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:34,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:36,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:36,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1039/1115 [6:45:48<16:38, 13.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:40,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:42,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:44,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:44,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1040/1115 [6:45:56<14:44, 11.79s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:48,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:50,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:51,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:51,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:54,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:56,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:58,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:24:58,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 1042/1115 [6:46:10<11:18, 9.29s/it] Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 1042/1115 [6:46:10<11:18, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▉ | 1042/1115 [6:46:10<11:18, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:05,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:08,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:08,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:12,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:12,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:15,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:15,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:19,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:23,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:23,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:33,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:37,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:37,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:41,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:41,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:44,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:44,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:48,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:48,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:52,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:52,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:56,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:25:56,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 1044/1115 [6:47:09<23:05, 19.51s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████ | 1044/1115 [6:47:09<23:05, 19.51s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:03,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:03,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:06,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:10,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:10,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:13,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:13,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:17,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:20,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:20,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:24,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|█████████��██████████████████████████████████████████████████████████████▏ | 1045/1115 [6:47:37<25:40, 22.01s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 1045/1115 [6:47:37<25:40, 22.01s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▏ | 1045/1115 [6:47:37<25:40, 22.01s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:30,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:30,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2143, 'learning_rate': 3.560975609756097e-05, 'epoch': 4.69} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1886, 'learning_rate': 3.512195121951219e-05, 'epoch': 4.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2147, 'learning_rate': 3.463414634146341e-05, 'epoch': 4.7} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1671, 'learning_rate': 3.365853658536585e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.15, 'learning_rate': 3.317073170731707e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1564, 'learning_rate': 3.268292682926829e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|█████████████████████████���██████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1517, 'learning_rate': 3.219512195121951e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|███████████████████████████████████████��████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.165, 'learning_rate': 3.170731707317073e-05, 'epoch': 4.73} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|██████████████████████████████████████████████████████���█████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1202, 'learning_rate': 3.121951219512195e-05, 'epoch': 4.73} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████��███▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1199, 'learning_rate': 3.0731707317073165e-05, 'epoch': 4.74} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1425, 'learning_rate': 3.024390243902439e-05, 'epoch': 4.74} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1322, 'learning_rate': 2.9756097560975606e-05, 'epoch': 4.74} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1356, 'learning_rate': 2.9268292682926826e-05, 'epoch': 4.75} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1227, 'learning_rate': 2.8780487804878046e-05, 'epoch': 4.75} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1052, 'learning_rate': 2.8292682926829267e-05, 'epoch': 4.76} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1255, 'learning_rate': 2.7804878048780484e-05, 'epoch': 4.76} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0951, 'learning_rate': 2.7317073170731707e-05, 'epoch': 4.77} + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1074, 'learning_rate': 2.6829268292682924e-05, 'epoch': 4.77} +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1143, 'learning_rate': 2.634146341463414e-05, 'epoch': 4.78} +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1112, 'learning_rate': 2.5853658536585365e-05, 'epoch': 4.78} + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1014, 'learning_rate': 2.5365853658536582e-05, 'epoch': 4.78} + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0987, 'learning_rate': 2.4878048780487805e-05, 'epoch': 4.79} + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1075, 'learning_rate': 2.3902439024390243e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1121, 'learning_rate': 2.3414634146341463e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0914, 'learning_rate': 2.292682926829268e-05, 'epoch': 4.81} + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1099, 'learning_rate': 2.24390243902439e-05, 'epoch': 4.81} + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0843, 'learning_rate': 2.1951219512195117e-05, 'epoch': 4.82} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1039, 'learning_rate': 2.146341463414634e-05, 'epoch': 4.82} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1008, 'learning_rate': 2.0975609756097558e-05, 'epoch': 4.83} +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0954, 'learning_rate': 2.048780487804878e-05, 'epoch': 4.83} +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:39:49,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:39:49,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:39:49,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 1078/1115 [7:01:14<13:24, 21.73s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 1078/1115 [7:01:14<13:24, 21.73s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0986, 'learning_rate': 1.9999999999999998e-05, 'epoch': 4.83} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 1078/1115 [7:01:14<13:24, 21.73s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:20,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:20,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1079/1115 [7:01:34<12:47, 21.33s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1079/1115 [7:01:34<12:47, 21.33s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:26,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:26,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:26,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:32,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:32,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:32,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:44,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:44,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:44,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:50,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:50,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:40:55,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:40:55,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:40:59,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0804, 'learning_rate': 1.8536585365853656e-05, 'epoch': 4.85} +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:11,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:11,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:15,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:18,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:18,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:21,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:21,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0896, 'learning_rate': 1.8048780487804876e-05, 'epoch': 4.85} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:26,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:28,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:30,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:30,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:33,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:41:33,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:37,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▊ | 1083/1115 [7:02:49<10:13, 19.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▊ | 1083/1115 [7:02:49<10:13, 19.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:41,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:44,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:46,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:48,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:50,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:52,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:53,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:53,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▊ | 1084/1115 [7:03:06<09:25, 18.26s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:57,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:41:59,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:01,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:03,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:05,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:08,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:08,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▉ | 1085/1115 [7:03:20<08:36, 17.20s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:12,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:14,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:15,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:17,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:20,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:22,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:22,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▉ | 1086/1115 [7:03:34<07:47, 16.11s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:25,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:28,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:30,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:31,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:34,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:34,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|███████████████████████████████████████████████████████████████████████████ | 1087/1115 [7:03:46<06:59, 15.00s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:39,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:40,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:42,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:45,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:46,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:46,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:49,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:50,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:53,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:55,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▏ | 1089/1115 [7:04:08<05:32, 12.81s/it] Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▏ | 1089/1115 [7:04:08<05:32, 12.81s/it] Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:42:59,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:03,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:07,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:09,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:11,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:13,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:13,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:15,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:17,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:19,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:19,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▍ | 1092/1115 [7:04:31<03:31, 9.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▍ | 1092/1115 [7:04:31<03:31, 9.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:25,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:25,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:29,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:29,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:32,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:36,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:36,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:39,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:39,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:43,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:43,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:46,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:46,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▍ | 1093/1115 [7:04:59<05:30, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▍ | 1093/1115 [7:04:59<05:30, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:53,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:53,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:43:57,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:00,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:00,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:03,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:03,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:07,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:07,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:11,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:15,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:15,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 1094/1115 [7:05:28<06:40, 19.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 1094/1115 [7:05:28<06:40, 19.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1688, 'learning_rate': 1.2195121951219511e-05, 'epoch': 4.91} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:22,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:25,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:25,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:28,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:28,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:32,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:32,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:35,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:39,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:39,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:42,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 1095/1115 [7:05:55<07:09, 21.49s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 1095/1115 [7:05:55<07:09, 21.49s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▌ | 1095/1115 [7:05:55<07:09, 21.49s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:49,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:49,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:52,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:56,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:56,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:59,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:44:59,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:45:02,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:45:06,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:45:06,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:45:09,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:45:09,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████��███████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1589, 'learning_rate': 1.073170731707317e-05, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1198, 'learning_rate': 1.024390243902439e-05, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1327, 'learning_rate': 9.75609756097561e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1172, 'learning_rate': 9.268292682926828e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1378, 'learning_rate': 8.780487804878048e-06, 'epoch': 4.94} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1146, 'learning_rate': 8.292682926829267e-06, 'epoch': 4.94} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1076, 'learning_rate': 7.804878048780487e-06, 'epoch': 4.95} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0927, 'learning_rate': 7.3170731707317065e-06, 'epoch': 4.95} +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1055, 'learning_rate': 6.3414634146341454e-06, 'epoch': 4.96} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0898, 'learning_rate': 5.853658536585366e-06, 'epoch': 4.96} +[WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:49,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:49,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:53,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:53,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1008, 'learning_rate': 4.878048780487805e-06, 'epoch': 4.97} +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:30,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:30,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:30,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:50:44,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:50:44,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1110/1115 [7:11:57<01:48, 21.65s/it] Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:48,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:48,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:50:52,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-26 02:50:52,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:56,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:50:58,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:00,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:02,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:02,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:05,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:07,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:09,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:10,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:12,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:14,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:16,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:18,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:18,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:19,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:21,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:24,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:27,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:28,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:31,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:31,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:32,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:35,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:36,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:38,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:40,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:40,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:42,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:45,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:47,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-26 02:51:47,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2069, 'learning_rate': 1.9512195121951218e-06, 'epoch': 5.0} +100%|█████████████████████████████████████████████████████████████████████████████| 1115/1115 [7:12:59<00:00, 23.30s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:00,252 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...