diff --git "a/wandb/run-20220328_170142-by95ehra/files/output.log" "b/wandb/run-20220328_170142-by95ehra/files/output.log" --- "a/wandb/run-20220328_170142-by95ehra/files/output.log" +++ "b/wandb/run-20220328_170142-by95ehra/files/output.log" @@ -6369,3 +6369,6143 @@ [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/28/2022 20:16:48 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220328_170142-by95ehra/run-by95ehra.wandb']. This may take a bit of time if the files are large. +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3839, 'learning_rate': 0.0002988, 'epoch': 4.51} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2854, 'learning_rate': 0.00029939999999999996, 'epoch': 4.52} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1582, 'learning_rate': 0.0003, 'epoch': 4.53} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9157, 'learning_rate': 0.0002995081967213115, 'epoch': 4.54} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.851, 'learning_rate': 0.0002990163934426229, 'epoch': 4.55} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8247, 'learning_rate': 0.00029852459016393437, 'epoch': 4.56} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7489, 'learning_rate': 0.00029803278688524587, 'epoch': 4.57} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6941, 'learning_rate': 0.00029754098360655737, 'epoch': 4.57} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.513, 'learning_rate': 0.0002970491803278688, 'epoch': 4.58} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.401, 'learning_rate': 0.0002965573770491803, 'epoch': 4.59} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:46,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:46,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:50,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:50,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.3516, 'learning_rate': 0.0002960655737704918, 'epoch': 4.6} +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:54,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:54,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:09,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:09,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.27, 'learning_rate': 0.00029557377049180326, 'epoch': 4.61} + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:33,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:33,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2286, 'learning_rate': 0.0002950819672131147, 'epoch': 4.62} +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:33,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:39,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:39,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:22:44,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:22:44,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:48,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:22:48,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:22:52,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:22:52,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1139, 'learning_rate': 0.0002945901639344262, 'epoch': 4.63} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:22:52,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:22:58,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:00,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:02,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:04,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:06,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:08,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████▎ | 515/1110 [3:21:26<3:28:12, 21.00s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████▎ | 515/1110 [3:21:26<3:28:12, 21.00s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:12,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:14,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:16,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:18,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:19,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:21,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████▎ | 516/1110 [3:21:41<3:08:51, 19.08s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 46%|███████████████████████████████████▎ | 516/1110 [3:21:41<3:08:51, 19.08s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:26,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:28,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:31,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:32,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:32,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:36,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▍ | 517/1110 [3:21:54<2:51:39, 17.37s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▍ | 517/1110 [3:21:54<2:51:39, 17.37s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:43,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:45,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▍ | 518/1110 [3:22:03<2:27:40, 14.97s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▍ | 518/1110 [3:22:03<2:27:40, 14.97s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:49,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:51,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:52,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 519/1110 [3:22:10<2:04:15, 12.62s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 519/1110 [3:22:10<2:04:15, 12.62s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 519/1110 [3:22:10<2:04:15, 12.62s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:59,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:23:59,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:03,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:03,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:10,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:10,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:13,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:17,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:17,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:21,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:21,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:21,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 520/1110 [3:22:40<2:53:03, 17.60s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|███████████████████████████████████▌ | 520/1110 [3:22:40<2:53:03, 17.60s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:28,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:31,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:31,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:35,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:35,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:38,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7027, 'learning_rate': 0.00029114754098360655, 'epoch': 4.69} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6879, 'learning_rate': 0.000290655737704918, 'epoch': 4.7} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.9378, 'learning_rate': 0.00029016393442622945, 'epoch': 4.71} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.511, 'learning_rate': 0.00028967213114754095, 'epoch': 4.72} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1899, 'learning_rate': 0.00028918032786885245, 'epoch': 4.73} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9893, 'learning_rate': 0.0002886885245901639, 'epoch': 4.74} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7267, 'learning_rate': 0.0002881967213114754, 'epoch': 4.74} + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5322, 'learning_rate': 0.00028770491803278684, 'epoch': 4.75} + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4467, 'learning_rate': 0.00028721311475409834, 'epoch': 4.76} + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.3211, 'learning_rate': 0.0002867213114754098, 'epoch': 4.77} + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1658, 'learning_rate': 0.0002862295081967213, 'epoch': 4.78} + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1107, 'learning_rate': 0.0002857377049180328, 'epoch': 4.79} + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0426, 'learning_rate': 0.00028524590163934424, 'epoch': 4.8} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1146, 'learning_rate': 0.0002847540983606557, 'epoch': 4.81} +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9805, 'learning_rate': 0.0002837704918032787, 'epoch': 4.83} + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:34,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:34,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:34,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:38,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:38,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:42,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:42,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:42,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▊ | 538/1110 [3:30:12<3:33:16, 22.37s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 48%|████████████████████████████████████▊ | 538/1110 [3:30:12<3:33:16, 22.37s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.825, 'learning_rate': 0.00028278688524590163, 'epoch': 4.84} + 48%|████████████████████████████████████▊ | 538/1110 [3:30:12<3:33:16, 22.37s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:02,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:04,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:04,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:04,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:10,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:12,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:15,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:15,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8417, 'learning_rate': 0.0002822950819672131, 'epoch': 4.85} +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:15,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:20,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:22,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:24,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:26,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:29,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:30,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:33,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:33,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:34,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:36,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:38,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:40,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:42,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:45,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:45,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:47,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:48,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:50,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:53,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:54,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:57,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:59,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:32:59,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:00,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:02,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:05,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:09,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:11,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:13,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:15,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:15,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:18,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:18,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:21,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:21,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:25,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:25,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:29,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:29,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:32,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:35,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:35,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:39,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:39,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:42,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:42,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:46,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:46,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:53,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:56,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:33:56,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:00,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:00,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:03,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4873, 'learning_rate': 0.0002788524590163934, 'epoch': 4.91} +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9983, 'learning_rate': 0.00027836065573770487, 'epoch': 4.92} +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6671, 'learning_rate': 0.00027786885245901637, 'epoch': 4.93} +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.535, 'learning_rate': 0.00027737704918032787, 'epoch': 4.94} +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2795, 'learning_rate': 0.0002768852459016393, 'epoch': 4.95} +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1159, 'learning_rate': 0.00027639344262295076, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9898, 'learning_rate': 0.00027590163934426227, 'epoch': 4.97} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:51,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:51,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:51,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8807, 'learning_rate': 0.00027540983606557377, 'epoch': 4.98} +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:04,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:06,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:08,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:10,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:12,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:14,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:18,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:19,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:22,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:25,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:27,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:28,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:28,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:31,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:32,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:35,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:35,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:38,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:38,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:42,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:42,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:45,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:49,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:49,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9571, 'learning_rate': 0.00027393442622950816, 'epoch': 5.01} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5384, 'learning_rate': 0.00027344262295081966, 'epoch': 5.02} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2256, 'learning_rate': 0.0002729508196721311, 'epoch': 5.03} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0547, 'learning_rate': 0.0002724590163934426, 'epoch': 5.04} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9737, 'learning_rate': 0.0002719672131147541, 'epoch': 5.04} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8202, 'learning_rate': 0.00027147540983606556, 'epoch': 5.05} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7816, 'learning_rate': 0.000270983606557377, 'epoch': 5.06} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|████████████████████��█████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6287, 'learning_rate': 0.00027, 'epoch': 5.08} + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5932, 'learning_rate': 0.00026950819672131145, 'epoch': 5.09} + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5249, 'learning_rate': 0.0002690163934426229, 'epoch': 5.1} + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5345, 'learning_rate': 0.0002685245901639344, 'epoch': 5.11} + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5351, 'learning_rate': 0.0002680327868852459, 'epoch': 5.12} +[WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4818, 'learning_rate': 0.00026754098360655734, 'epoch': 5.13} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 571/1110 [3:42:45<3:35:31, 23.99s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 51%|███████████████████████████████████████ | 571/1110 [3:42:45<3:35:31, 23.99s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4339, 'learning_rate': 0.00026655737704918035, 'epoch': 5.14} + 51%|███████████████████████████████████████ | 571/1110 [3:42:45<3:35:31, 23.99s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4029, 'learning_rate': 0.0002660655737704918, 'epoch': 5.15} +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:09,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:09,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4651, 'learning_rate': 0.00026557377049180324, 'epoch': 5.16} + 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:24,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:24,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▎ | 574/1110 [3:43:50<3:20:12, 22.41s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▎ | 574/1110 [3:43:50<3:20:12, 22.41s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3721, 'learning_rate': 0.00026508196721311474, 'epoch': 5.17} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:38,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:40,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:40,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:44,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:44,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:48,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:51,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:51,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▎ | 575/1110 [3:44:09<3:09:43, 21.28s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:54,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:45:54,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:45:58,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:00,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:03,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:05,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:07,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:09,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:09,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:11,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:13,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:14,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:16,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:18,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:20,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:23,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:23,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:25,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:26,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:29,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:31,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:32,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:35,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:35,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:39,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:41,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:42,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:46,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:46,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:48,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:49,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:51,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:51,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:53,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:53,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:56,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:46:56,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:00,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:03,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:03,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:07,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:07,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:11,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:11,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:14,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:14,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:18,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:21,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:21,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1772, 'learning_rate': 0.0002616393442622951, 'epoch': 5.23} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:25,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:25,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:29,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:32,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:32,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:36,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:36,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:39,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0168, 'learning_rate': 0.00026114754098360653, 'epoch': 5.24} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2479, 'learning_rate': 0.000260655737704918, 'epoch': 5.25} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2017, 'learning_rate': 0.0002601639344262295, 'epoch': 5.26} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1079, 'learning_rate': 0.000259672131147541, 'epoch': 5.27} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8699, 'learning_rate': 0.0002591803278688524, 'epoch': 5.28} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7881, 'learning_rate': 0.0002586885245901639, 'epoch': 5.29} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.7279, 'learning_rate': 0.00025819672131147537, 'epoch': 5.3} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6572, 'learning_rate': 0.00025770491803278687, 'epoch': 5.3} + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6395, 'learning_rate': 0.0002572131147540983, 'epoch': 5.31} +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5414, 'learning_rate': 0.0002567213114754098, 'epoch': 5.32} +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5121, 'learning_rate': 0.0002562295081967213, 'epoch': 5.33} +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4897, 'learning_rate': 0.00025573770491803277, 'epoch': 5.34} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4312, 'learning_rate': 0.0002552459016393442, 'epoch': 5.35} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4397, 'learning_rate': 0.0002547540983606557, 'epoch': 5.36} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3896, 'learning_rate': 0.0002542622950819672, 'epoch': 5.37} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3908, 'learning_rate': 0.00025377049180327866, 'epoch': 5.38} +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3858, 'learning_rate': 0.00025327868852459016, 'epoch': 5.39} +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:44,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:44,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:44,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:50,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:50,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:50,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3354, 'learning_rate': 0.0002527868852459016, 'epoch': 5.39} +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:57,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:57,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:54:57,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:03,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:03,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:55:07,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:55:07,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:11,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:11,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3314, 'learning_rate': 0.0002522950819672131, 'epoch': 5.4} +[WARNING|modeling_bart.py:1051] 2022-03-28 20:55:15,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:55:15,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:19,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:21,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:23,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:25,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:27,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:29,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:29,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:31,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:33,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:35,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:37,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:39,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:40,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:42,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:42,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:44,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:47,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:49,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:52,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:53,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:56,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:56,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:55:57,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:00,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:02,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:02,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:05,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:05,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:07,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:09,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:12,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:13,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:13,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3007, 'learning_rate': 0.00024983606557377045, 'epoch': 5.45} +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:17,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:17,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:21,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:21,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:24,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:28,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:28,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:32,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:32,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:35,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:35,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:39,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:42,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:42,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9302, 'learning_rate': 0.00024934426229508195, 'epoch': 5.46} +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:46,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:46,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:49,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:53,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:53,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:56:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:00,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.4471, 'learning_rate': 0.0002488524590163934, 'epoch': 5.47} +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0509, 'learning_rate': 0.0002483606557377049, 'epoch': 5.48} +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.9157, 'learning_rate': 0.0002478688524590164, 'epoch': 5.48} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8162, 'learning_rate': 0.00024737704918032785, 'epoch': 5.49} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6961, 'learning_rate': 0.0002468852459016393, 'epoch': 5.5} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6048, 'learning_rate': 0.0002463934426229508, 'epoch': 5.51} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6278, 'learning_rate': 0.0002459016393442623, 'epoch': 5.52} + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5126, 'learning_rate': 0.00024540983606557374, 'epoch': 5.53} + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|████████████████████████████████████���█████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4451, 'learning_rate': 0.0002449180327868852, 'epoch': 5.54} + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4496, 'learning_rate': 0.0002444262295081967, 'epoch': 5.55} + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3534, 'learning_rate': 0.00024393442622950816, 'epoch': 5.56} + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3675, 'learning_rate': 0.00024344262295081966, 'epoch': 5.57} + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3711, 'learning_rate': 0.0002424590163934426, 'epoch': 5.58} + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3521, 'learning_rate': 0.00024196721311475406, 'epoch': 5.59} + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3142, 'learning_rate': 0.00024147540983606556, 'epoch': 5.6} + 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:36,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:36,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:52,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:52,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:52,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:07,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:07,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:07,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:13,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:13,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:13,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2855, 'learning_rate': 0.0002404918032786885, 'epoch': 5.62} +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:19,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:19,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:19,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:25,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:25,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:29,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:29,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:33,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:04:33,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3093, 'learning_rate': 0.00023999999999999998, 'epoch': 5.63} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:38,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:40,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:40,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:40,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:45,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:47,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:49,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▊ | 626/1110 [4:03:07<2:42:50, 20.19s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▊ | 626/1110 [4:03:07<2:42:50, 20.19s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:53,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:55,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:57,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:04:59,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:01,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:02,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▉ | 627/1110 [4:03:22<2:29:13, 18.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 56%|██████████████████████████████████████████▉ | 627/1110 [4:03:22<2:29:13, 18.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:07,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:09,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:12,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:14,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:15,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 628/1110 [4:03:34<2:13:26, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|██████████████████████████████████████████▉ | 628/1110 [4:03:34<2:13:26, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:19,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:22,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:24,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:24,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:32,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:34,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 630/1110 [4:03:52<1:40:30, 12.56s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 630/1110 [4:03:52<1:40:30, 12.56s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 630/1110 [4:03:52<1:40:30, 12.56s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:40,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:40,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:44,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:44,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:48,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:48,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:51,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:51,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:55,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:58,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:05:58,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:02,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:09,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:13,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:13,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:16,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:16,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.832, 'learning_rate': 0.0002365573770491803, 'epoch': 5.69} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6695, 'learning_rate': 0.00023606557377049177, 'epoch': 5.7} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6161, 'learning_rate': 0.00023557377049180327, 'epoch': 5.71} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5685, 'learning_rate': 0.00023508196721311474, 'epoch': 5.72} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5094, 'learning_rate': 0.00023459016393442622, 'epoch': 5.73} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4618, 'learning_rate': 0.00023409836065573766, 'epoch': 5.74} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4221, 'learning_rate': 0.00023360655737704916, 'epoch': 5.74} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3998, 'learning_rate': 0.00023311475409836064, 'epoch': 5.75} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.369, 'learning_rate': 0.0002326229508196721, 'epoch': 5.76} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3503, 'learning_rate': 0.00023213114754098358, 'epoch': 5.77} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3426, 'learning_rate': 0.00023163934426229506, 'epoch': 5.78} + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3081, 'learning_rate': 0.00023114754098360653, 'epoch': 5.79} + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.321, 'learning_rate': 0.000230655737704918, 'epoch': 5.8} + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3081, 'learning_rate': 0.00023016393442622948, 'epoch': 5.81} + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▏ | 646/1110 [4:10:46<3:04:34, 23.87s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▏ | 646/1110 [4:10:46<3:04:34, 23.87s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3378, 'learning_rate': 0.00022967213114754098, 'epoch': 5.82} + 58%|████████████████████████████████████████████▏ | 646/1110 [4:10:46<3:04:34, 23.87s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:12:36,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:12:36,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:12:36,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▎ | 647/1110 [4:11:08<2:59:45, 23.29s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▎ | 647/1110 [4:11:08<2:59:45, 23.29s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2946, 'learning_rate': 0.00022918032786885245, 'epoch': 5.83} + 58%|████████████████████████████████████████████▎ | 647/1110 [4:11:08<2:59:45, 23.29s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:13:11,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:13:11,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|█████████████████████████��██████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3051, 'learning_rate': 0.0002286885245901639, 'epoch': 5.83} + 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:23,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:23,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:13:27,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:13:27,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:13:27,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:33,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▍ | 649/1110 [4:11:51<2:51:05, 22.27s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 58%|████████████████████████████████████████████▍ | 649/1110 [4:11:51<2:51:05, 22.27s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2399, 'learning_rate': 0.00022819672131147537, 'epoch': 5.84} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:39,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:42,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:42,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:42,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:48,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:50,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:52,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:52,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 650/1110 [4:12:10<2:42:34, 21.20s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 59%|████████████████████████████████████████████▌ | 650/1110 [4:12:10<2:42:34, 21.20s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:13:58,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:00,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:02,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:04,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:06,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:08,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:10,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:10,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:12,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:14,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:16,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:18,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:19,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:21,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:24,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:24,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:26,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:31,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:32,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:35,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:36,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:36,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:39,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:40,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:42,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:43,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:47,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:47,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:49,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:50,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:53,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:53,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:54,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:57,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:14:57,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:01,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:01,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:04,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:04,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:08,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:11,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:11,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:15,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:15,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:18,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:22,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:22,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.8577, 'learning_rate': 0.00022475409836065572, 'epoch': 5.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:25,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:25,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:29,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:32,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:32,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:35,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:35,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:39,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:42,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:42,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6569, 'learning_rate': 0.0002242622950819672, 'epoch': 5.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5387, 'learning_rate': 0.0002237704918032787, 'epoch': 5.92} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4774, 'learning_rate': 0.00022327868852459014, 'epoch': 5.93} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4175, 'learning_rate': 0.0002227868852459016, 'epoch': 5.94} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3658, 'learning_rate': 0.00022229508196721309, 'epoch': 5.95} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:49,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:49,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:49,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3594, 'learning_rate': 0.00022180327868852459, 'epoch': 5.96} +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▍ | 663/1110 [4:16:36<2:55:39, 23.58s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▍ | 663/1110 [4:16:36<2:55:39, 23.58s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3114, 'learning_rate': 0.00022131147540983606, 'epoch': 5.97} + 60%|█████████████████████████████████████████████▍ | 663/1110 [4:16:36<2:55:39, 23.58s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:34,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:34,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:38,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:38,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:38,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2732, 'learning_rate': 0.0002208196721311475, 'epoch': 5.98} +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:44,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:46,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:18:46,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:50,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:52,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:54,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:56,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:56,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:58,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:18:59,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:01,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:04,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:06,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:08,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:08,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:09,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:12,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:12,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:16,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:16,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:23,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:27,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:27,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6729, 'learning_rate': 0.00021934426229508195, 'epoch': 6.01} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4497, 'learning_rate': 0.00021885245901639343, 'epoch': 6.02} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3901, 'learning_rate': 0.0002183606557377049, 'epoch': 6.03} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3872, 'learning_rate': 0.00021786885245901638, 'epoch': 6.04} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|████████████████████████████████████████████��▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3052, 'learning_rate': 0.00021737704918032785, 'epoch': 6.04} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|████████████████████████████████��████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2726, 'learning_rate': 0.00021688524590163932, 'epoch': 6.05} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|████████████████████��████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2653, 'learning_rate': 0.0002163934426229508, 'epoch': 6.06} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2387, 'learning_rate': 0.0002159016393442623, 'epoch': 6.07} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2052, 'learning_rate': 0.00021540983606557374, 'epoch': 6.08} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2342, 'learning_rate': 0.00021491803278688522, 'epoch': 6.09} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1993, 'learning_rate': 0.0002144262295081967, 'epoch': 6.1} + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1983, 'learning_rate': 0.0002139344262295082, 'epoch': 6.11} + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1869, 'learning_rate': 0.00021344262295081967, 'epoch': 6.12} + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|███████████████████████████████████���██████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1725, 'learning_rate': 0.00021295081967213114, 'epoch': 6.13} + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1686, 'learning_rate': 0.00021245901639344259, 'epoch': 6.13} +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1684, 'learning_rate': 0.0002119672131147541, 'epoch': 6.14} +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1432, 'learning_rate': 0.00021147540983606556, 'epoch': 6.15} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:26:47,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:26:47,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▊ | 684/1110 [4:25:08<2:42:04, 22.83s/it]g-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|██████████████████████████████████████████████▊ | 684/1110 [4:25:08<2:42:04, 22.83s/it]g-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1873, 'learning_rate': 0.00021098360655737703, 'epoch': 6.16} + 62%|██████████████████████████████████████████████▊ | 684/1110 [4:25:08<2:42:04, 22.83s/it]g-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:06,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:06,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:06,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1583, 'learning_rate': 0.0002104918032786885, 'epoch': 6.17} +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:19,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:22,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:22,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:27:26,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:27:26,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:30,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:30,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:30,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1472, 'learning_rate': 0.00020999999999999998, 'epoch': 6.18} +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:36,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:38,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:40,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:43,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:45,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:47,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:49,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:27:49,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1488, 'learning_rate': 0.00020950819672131146, 'epoch': 6.19} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:27:53,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:27:54,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:27:56,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:27:58,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:00,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:02,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:02,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████ | 688/1110 [4:26:21<2:11:05, 18.64s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:07,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:08,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:10,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:13,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:14,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:14,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 689/1110 [4:26:33<1:56:59, 16.67s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:18,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:21,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:23,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:25,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:25,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▏ | 690/1110 [4:26:43<1:41:23, 14.48s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:29,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:31,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:33,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:33,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 691/1110 [4:26:51<1:28:28, 12.67s/it] Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 691/1110 [4:26:51<1:28:28, 12.67s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▎ | 691/1110 [4:26:51<1:28:28, 12.67s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:40,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:43,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:43,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:47,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:47,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:50,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:50,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:54,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:58,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:28:58,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:01,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it] Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it] Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:08,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:12,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:12,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:15,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:15,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:19,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.6188, 'learning_rate': 0.0002065573770491803, 'epoch': 6.24} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.423, 'learning_rate': 0.0002060655737704918, 'epoch': 6.25} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3769, 'learning_rate': 0.00020557377049180327, 'epoch': 6.26} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.376, 'learning_rate': 0.00020508196721311475, 'epoch': 6.27} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2984, 'learning_rate': 0.0002045901639344262, 'epoch': 6.28} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 698/1110 [4:30:04<2:58:30, 26.00s/it] Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▊ | 698/1110 [4:30:04<2:58:30, 26.00s/it] Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2631, 'learning_rate': 0.0002040983606557377, 'epoch': 6.29} +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.254, 'learning_rate': 0.00020360655737704917, 'epoch': 6.3} +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.252, 'learning_rate': 0.00020311475409836064, 'epoch': 6.3} +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2213, 'learning_rate': 0.00020262295081967211, 'epoch': 6.31} + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2044, 'learning_rate': 0.00020213114754098356, 'epoch': 6.32} + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1789, 'learning_rate': 0.00020163934426229506, 'epoch': 6.33} + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|██████████████████████████��████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2126, 'learning_rate': 0.00020114754098360653, 'epoch': 6.34} + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1658, 'learning_rate': 0.000200655737704918, 'epoch': 6.35} + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1732, 'learning_rate': 0.0002001639344262295, 'epoch': 6.36} +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:35:17,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:35:17,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:35:17,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1748, 'learning_rate': 0.00019967213114754098, 'epoch': 6.37} +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:46,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:46,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1638, 'learning_rate': 0.00019918032786885243, 'epoch': 6.38} +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:12,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:12,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.157, 'learning_rate': 0.0001986885245901639, 'epoch': 6.39} +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:16,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:16,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:16,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:37,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:37,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:37,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:43,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:43,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:36:47,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:36:47,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:51,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:54,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:54,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:54,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1552, 'learning_rate': 0.00019770491803278688, 'epoch': 6.4} +[WARNING|modeling_utils.py:388] 2022-03-28 21:36:59,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:01,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:04,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:06,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:08,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:10,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:12,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:12,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:14,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:16,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:18,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:19,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:21,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:23,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:25,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:25,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:26,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:30,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:31,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:35,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:38,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:38,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:40,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:42,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:44,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:46,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:48,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:48,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:50,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:52,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:54,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:56,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:37:56,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1373, 'learning_rate': 0.00019524590163934425, 'epoch': 6.45} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:00,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:00,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:03,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:03,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:07,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:07,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:10,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:14,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:14,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:17,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:17,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:21,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:25,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:25,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.5316, 'learning_rate': 0.00019475409836065572, 'epoch': 6.46} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:28,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:28,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:32,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:32,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:35,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:39,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:39,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4373, 'learning_rate': 0.00019426229508196722, 'epoch': 6.47} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3649, 'learning_rate': 0.00019377049180327867, 'epoch': 6.48} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3233, 'learning_rate': 0.00019327868852459014, 'epoch': 6.48} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2931, 'learning_rate': 0.00019278688524590161, 'epoch': 6.49} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2771, 'learning_rate': 0.00019229508196721312, 'epoch': 6.5} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2295, 'learning_rate': 0.0001918032786885246, 'epoch': 6.51} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2316, 'learning_rate': 0.00019131147540983604, 'epoch': 6.52} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2157, 'learning_rate': 0.0001908196721311475, 'epoch': 6.53} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2125, 'learning_rate': 0.000190327868852459, 'epoch': 6.54} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1578, 'learning_rate': 0.00018983606557377048, 'epoch': 6.55} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1633, 'learning_rate': 0.00018934426229508196, 'epoch': 6.56} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1553, 'learning_rate': 0.00018885245901639343, 'epoch': 6.57} +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1544, 'learning_rate': 0.00018836065573770488, 'epoch': 6.57} + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:44:32,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:44:32,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:44:32,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1519, 'learning_rate': 0.00018737704918032785, 'epoch': 6.59} +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:09,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:09,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:13,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:13,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1557, 'learning_rate': 0.00018688524590163933, 'epoch': 6.6} +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1656, 'learning_rate': 0.00018639344262295083, 'epoch': 6.61} +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:45:56,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:45:56,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1322, 'learning_rate': 0.00018590163934426227, 'epoch': 6.62} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:45:56,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:02,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:02,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:06,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:06,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:10,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:10,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:14,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:14,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:14,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:18,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:20,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:46:20,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:24,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:26,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:32,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:32,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:34,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:36,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:38,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:40,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:41,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:45,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:46,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:46,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:48,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:50,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:53,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:54,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:57,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:58,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:46:58,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:01,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:03,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:07,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:09,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:11,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:12,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:15,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:15,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:16,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:16,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:20,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:23,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:23,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:27,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:27,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:30,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:30,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:34,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:38,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:38,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:41,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:41,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:41,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:45,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:45,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:48,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:52,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:52,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:55,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:55,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:47:59,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:48:02,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:48:02,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:48:06,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:48:06,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:48:06,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.4009, 'learning_rate': 0.0001819672131147541, 'epoch': 6.69} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3291, 'learning_rate': 0.00018147540983606556, 'epoch': 6.7} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2757, 'learning_rate': 0.00018098360655737704, 'epoch': 6.71} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|████████████████████████████████████████████���█████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2992, 'learning_rate': 0.00018049180327868848, 'epoch': 6.72} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2224, 'learning_rate': 0.00017999999999999998, 'epoch': 6.73} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2268, 'learning_rate': 0.00017950819672131146, 'epoch': 6.74} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2091, 'learning_rate': 0.00017901639344262293, 'epoch': 6.74} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2108, 'learning_rate': 0.00017852459016393443, 'epoch': 6.75} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1898, 'learning_rate': 0.00017803278688524588, 'epoch': 6.76} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1764, 'learning_rate': 0.00017754098360655735, 'epoch': 6.77} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1669, 'learning_rate': 0.00017704918032786883, 'epoch': 6.78} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1502, 'learning_rate': 0.00017655737704918033, 'epoch': 6.79} + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|███████████████████████████████████████████████��██▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1734, 'learning_rate': 0.0001760655737704918, 'epoch': 6.8} + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1519, 'learning_rate': 0.00017557377049180327, 'epoch': 6.81} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1561, 'learning_rate': 0.00017508196721311472, 'epoch': 6.82} +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1471, 'learning_rate': 0.0001745901639344262, 'epoch': 6.83} +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▉ | 759/1110 [4:53:10<2:13:13, 22.77s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|███████████████████████████████████████████████████▉ | 759/1110 [4:53:10<2:13:13, 22.77s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:12,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:12,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|████████████████████████████████████████████████████ | 760/1110 [4:53:32<2:12:00, 22.63s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 68%|████████████████████████████████████████████████████ | 760/1110 [4:53:32<2:12:00, 22.63s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1161, 'learning_rate': 0.00017360655737704917, 'epoch': 6.84} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:20,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:23,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:23,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:26,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:26,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:26,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:32,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:35,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:35,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1166, 'learning_rate': 0.00017311475409836064, 'epoch': 6.85} +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:39,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:41,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:43,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:45,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:47,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:49,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:51,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:51,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:53,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:55:53,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:56,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:55:58,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:00,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:02,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:03,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:07,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:07,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:08,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:10,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:13,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:14,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:16,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:19,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:19,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:20,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:23,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:25,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:27,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:27,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:29,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:31,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:33,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:35,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:36,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:36,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:38,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:38,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:42,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:42,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:45,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:49,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:49,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:52,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:52,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:56,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:59,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:56:59,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:03,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:03,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:03,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:06,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:06,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:10,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:13,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:13,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:17,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:20,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:20,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3025, 'learning_rate': 0.00016967213114754096, 'epoch': 6.91} +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2683, 'learning_rate': 0.00016918032786885243, 'epoch': 6.92} +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2023, 'learning_rate': 0.00016868852459016393, 'epoch': 6.93} +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.184, 'learning_rate': 0.0001681967213114754, 'epoch': 6.94} +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1605, 'learning_rate': 0.00016770491803278688, 'epoch': 6.95} +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.142, 'learning_rate': 0.00016721311475409833, 'epoch': 6.96} +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:59:48,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:59:48,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:59:52,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:59:52,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:59:56,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 21:59:56,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1362, 'learning_rate': 0.0001667213114754098, 'epoch': 6.97} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:11,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:11,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:11,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:16,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:16,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:21,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:00:21,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1242, 'learning_rate': 0.0001662295081967213, 'epoch': 6.98} +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:25,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:27,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:29,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:31,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:33,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:35,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:37,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:37,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:39,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:40,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:43,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:45,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:47,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:49,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:49,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:51,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:52,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:54,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:54,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:58,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:00:58,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:02,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:05,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:05,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:09,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:09,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.3292, 'learning_rate': 0.00016475409836065575, 'epoch': 7.01} +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1917, 'learning_rate': 0.0001642622950819672, 'epoch': 7.02} +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1808, 'learning_rate': 0.00016377049180327867, 'epoch': 7.03} +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1459, 'learning_rate': 0.00016327868852459014, 'epoch': 7.04} + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1363, 'learning_rate': 0.00016278688524590164, 'epoch': 7.04} + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1299, 'learning_rate': 0.00016229508196721312, 'epoch': 7.05} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1337, 'learning_rate': 0.00016180327868852456, 'epoch': 7.06} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████��███████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1175, 'learning_rate': 0.00016131147540983604, 'epoch': 7.07} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1139, 'learning_rate': 0.00016081967213114754, 'epoch': 7.08} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1132, 'learning_rate': 0.000160327868852459, 'epoch': 7.09} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0938, 'learning_rate': 0.00015983606557377049, 'epoch': 7.1} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0853, 'learning_rate': 0.00015934426229508193, 'epoch': 7.11} + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████���██████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0911, 'learning_rate': 0.0001588524590163934, 'epoch': 7.12} + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0876, 'learning_rate': 0.0001583606557377049, 'epoch': 7.13} + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0952, 'learning_rate': 0.00015786885245901638, 'epoch': 7.13} + 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|████████████████���█████████████████████████████████████▎ | 793/1110 [5:06:06<2:06:56, 24.03s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 71%|██████████████████████████████████████████████████████▎ | 793/1110 [5:06:06<2:06:56, 24.03s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0965, 'learning_rate': 0.00015737704918032785, 'epoch': 7.14} +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▎ | 794/1110 [5:06:28<2:03:15, 23.40s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▎ | 794/1110 [5:06:28<2:03:15, 23.40s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:24,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:24,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:28,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:28,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0931, 'learning_rate': 0.0001563934426229508, 'epoch': 7.16} + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:46,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:46,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:46,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:52,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:08:52,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0801, 'learning_rate': 0.00015590163934426228, 'epoch': 7.17} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:08:57,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:08:57,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:01,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:01,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:01,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:07,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:07,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:13,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:13,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.081, 'learning_rate': 0.00015540983606557375, 'epoch': 7.18} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:17,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:19,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:21,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:23,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:25,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:09:25,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:29,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:29,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:31,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:33,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:35,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:36,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:38,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:40,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:43,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:43,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:45,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:47,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:50,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:51,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:53,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:55,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:55,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:58,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:09:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:01,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:03,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:03,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:06,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:07,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:10,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:12,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:12,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:14,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:14,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:18,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:18,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:21,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:21,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:25,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:28,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:28,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:32,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:32,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:32,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:37,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:41,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:41,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:41,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:45,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:45,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:48,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:48,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:52,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:55,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:55,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:58,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:10:58,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:02,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2267, 'learning_rate': 0.0001519672131147541, 'epoch': 7.24} +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.18, 'learning_rate': 0.000150983606557377, 'epoch': 7.26} + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1531, 'learning_rate': 0.0001504918032786885, 'epoch': 7.27} + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1334, 'learning_rate': 0.00015, 'epoch': 7.28} + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|█████████████████████████████████████████���█████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1211, 'learning_rate': 0.00014950819672131146, 'epoch': 7.29} + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1199, 'learning_rate': 0.00014901639344262293, 'epoch': 7.3} + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1123, 'learning_rate': 0.0001485245901639344, 'epoch': 7.3} + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1059, 'learning_rate': 0.0001480327868852459, 'epoch': 7.31} + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|████���██████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0931, 'learning_rate': 0.00014754098360655736, 'epoch': 7.32} + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|██████████████████████████��████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0969, 'learning_rate': 0.00014704918032786886, 'epoch': 7.33} + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0967, 'learning_rate': 0.0001465573770491803, 'epoch': 7.34} + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0934, 'learning_rate': 0.00014606557377049178, 'epoch': 7.35} + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0883, 'learning_rate': 0.00014557377049180328, 'epoch': 7.36} + 74%|��██████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:03,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:03,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:03,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0828, 'learning_rate': 0.00014508196721311472, 'epoch': 7.37} + 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|███████���████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 819/1110 [5:15:47<1:54:15, 23.56s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 74%|████████████████████████████████████████████████████████ | 819/1110 [5:15:47<1:54:15, 23.56s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0905, 'learning_rate': 0.00014459016393442622, 'epoch': 7.38} + 74%|████████████████████████████████████████████████████████ | 819/1110 [5:15:47<1:54:15, 23.56s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0853, 'learning_rate': 0.0001440983606557377, 'epoch': 7.39} +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:04,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:04,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:04,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0755, 'learning_rate': 0.00014360655737704917, 'epoch': 7.39} +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:18,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:18,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:18,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:26,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:26,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:30,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:32,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:32,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:32,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0726, 'learning_rate': 0.00014311475409836065, 'epoch': 7.4} +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:38,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:40,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:42,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:18:42,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:46,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:48,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:50,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:50,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:52,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:54,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:56,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:18:58,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:00,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:02,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:04,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:07,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:07,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:09,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:10,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:12,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:15,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:16,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:20,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:22,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:25,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:27,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:29,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:29,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:31,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:33,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:34,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:34,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:37,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:37,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:41,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:41,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:45,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:45,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:48,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:48,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:52,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:55,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:19:55,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:04,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:04,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:08,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:12,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:12,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:15,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:15,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:19,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:22,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:22,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2098, 'learning_rate': 0.00013967213114754096, 'epoch': 7.47} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1766, 'learning_rate': 0.00013918032786885243, 'epoch': 7.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1588, 'learning_rate': 0.00013868852459016394, 'epoch': 7.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.143, 'learning_rate': 0.00013819672131147538, 'epoch': 7.49} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1298, 'learning_rate': 0.00013770491803278688, 'epoch': 7.5} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1163, 'learning_rate': 0.00013721311475409833, 'epoch': 7.51} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1181, 'learning_rate': 0.00013672131147540983, 'epoch': 7.52} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.116, 'learning_rate': 0.0001362295081967213, 'epoch': 7.53} + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0969, 'learning_rate': 0.00013573770491803278, 'epoch': 7.54} + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0956, 'learning_rate': 0.00013524590163934425, 'epoch': 7.55} + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0921, 'learning_rate': 0.00013475409836065573, 'epoch': 7.56} + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1068, 'learning_rate': 0.0001342622950819672, 'epoch': 7.57} + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 841/1110 [5:24:02<1:52:37, 25.12s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|█████████████████████████████████████████████████████████▌ | 841/1110 [5:24:02<1:52:37, 25.12s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0797, 'learning_rate': 0.00013327868852459017, 'epoch': 7.58} +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0789, 'learning_rate': 0.00013278688524590162, 'epoch': 7.59} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:26:41,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:26:41,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:26:45,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:26:45,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:26:45,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:10,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:10,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0798, 'learning_rate': 0.00013180327868852457, 'epoch': 7.61} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:36,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:36,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0625, 'learning_rate': 0.00013131147540983604, 'epoch': 7.62} +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:36,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:42,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:42,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:42,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:49,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:49,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:53,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:55,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:27:55,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0729, 'learning_rate': 0.00013081967213114754, 'epoch': 7.63} +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:59,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:27:59,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:03,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:05,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:07,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:09,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:11,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:11,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████ | 848/1110 [5:26:29<1:28:02, 20.16s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:15,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:17,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:19,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:21,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:22,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:24,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:24,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 76%|██████████████████████████████████████████████████████████▏ | 849/1110 [5:26:43<1:20:15, 18.45s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:29,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:31,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:34,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:35,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:38,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:38,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▏ | 850/1110 [5:26:55<1:11:24, 16.48s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:42,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:43,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:45,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:47,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:47,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:49,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:52,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:54,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:28:54,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|███████████████████████████████████████████████████████████▊ | 852/1110 [5:27:12<52:09, 12.13s/it] Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|███████████████████████████████████████████████████████████▊ | 852/1110 [5:27:12<52:09, 12.13s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:00,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:00,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:04,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:04,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:07,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:07,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:11,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:14,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:14,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:20,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:20,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:24,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▍ | 853/1110 [5:27:43<1:16:03, 17.76s/it] Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▍ | 853/1110 [5:27:43<1:16:03, 17.76s/it] Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 77%|██████████████████████████████████████████████████████████▍ | 853/1110 [5:27:43<1:16:03, 17.76s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:31,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:31,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:34,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:34,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:38,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:41,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:41,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:45,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:45,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:48,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.2042, 'learning_rate': 0.00012737704918032786, 'epoch': 7.69} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1538, 'learning_rate': 0.00012688524590163933, 'epoch': 7.7} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1337, 'learning_rate': 0.0001263934426229508, 'epoch': 7.71} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1333, 'learning_rate': 0.00012590163934426228, 'epoch': 7.72} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1308, 'learning_rate': 0.00012540983606557378, 'epoch': 7.73} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1062, 'learning_rate': 0.00012491803278688523, 'epoch': 7.74} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1091, 'learning_rate': 0.0001244262295081967, 'epoch': 7.74} +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1118, 'learning_rate': 0.0001239344262295082, 'epoch': 7.75} +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0905, 'learning_rate': 0.00012295081967213115, 'epoch': 7.77} + 78%|█████████████████���█████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1006, 'learning_rate': 0.0001224590163934426, 'epoch': 7.78} + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0995, 'learning_rate': 0.00012196721311475408, 'epoch': 7.79} + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0921, 'learning_rate': 0.00012147540983606557, 'epoch': 7.8} + 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0971, 'learning_rate': 0.00012098360655737703, 'epoch': 7.81} + 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:40,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:40,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0866, 'learning_rate': 0.00011999999999999999, 'epoch': 7.83} + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:23,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:23,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0981, 'learning_rate': 0.00011950819672131146, 'epoch': 7.83} +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:42,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:42,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:46,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:46,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:46,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:52,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:52,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:52,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.072, 'learning_rate': 0.00011901639344262294, 'epoch': 7.84} +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:22,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:25,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:27,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:37:27,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:30,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:32,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:34,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:36,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:36,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:39,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:40,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:42,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:44,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:46,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:49,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:51,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:51,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:53,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:54,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:56,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:37:59,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:00,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:03,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:03,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:04,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:07,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:09,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:11,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:12,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:12,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:15,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:17,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:19,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:19,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0819, 'learning_rate': 0.00011606557377049179, 'epoch': 7.9} +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:23,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:23,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:27,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:27,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:30,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:34,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:34,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:43,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:43,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:47,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:47,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:50,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:50,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:54,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:54,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:38:57,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:00,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:00,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:04,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:04,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1619, 'learning_rate': 0.00011508196721311474, 'epoch': 7.91} +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1235, 'learning_rate': 0.00011459016393442623, 'epoch': 7.92} +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0996, 'learning_rate': 0.00011409836065573769, 'epoch': 7.93} +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0896, 'learning_rate': 0.00011360655737704917, 'epoch': 7.94} +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0785, 'learning_rate': 0.00011311475409836063, 'epoch': 7.95} +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0756, 'learning_rate': 0.00011262295081967212, 'epoch': 7.96} +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:01,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:01,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0629, 'learning_rate': 0.00011163934426229507, 'epoch': 7.98} +[WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:12,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:17,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:19,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:21,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:23,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:23,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|████████████████████████████████████████████████████████████▋ | 887/1110 [5:40:40<1:17:51, 20.95s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:26,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:29,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:31,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:33,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|████████████████████████████████████████████████████████████▊ | 888/1110 [5:40:51<1:06:29, 17.97s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 80%|████████████████████████████████████████████████████████████▊ | 888/1110 [5:40:51<1:06:29, 17.97s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:37,602 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:38,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:40,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:44,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:44,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:47,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:47,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:51,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:51,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:42:58,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1714, 'learning_rate': 0.00011016393442622949, 'epoch': 8.01} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1262, 'learning_rate': 0.00010967213114754098, 'epoch': 8.02} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.089, 'learning_rate': 0.00010918032786885245, 'epoch': 8.03} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1003, 'learning_rate': 0.00010868852459016392, 'epoch': 8.04} + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0913, 'learning_rate': 0.0001081967213114754, 'epoch': 8.04} + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|███████████████████████████████████████████████���█████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0799, 'learning_rate': 0.00010770491803278687, 'epoch': 8.05} + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████��███████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0765, 'learning_rate': 0.00010721311475409835, 'epoch': 8.06} + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0802, 'learning_rate': 0.00010672131147540983, 'epoch': 8.07} + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0641, 'learning_rate': 0.00010622950819672129, 'epoch': 8.08} + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0614, 'learning_rate': 0.00010573770491803278, 'epoch': 8.09} + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.064, 'learning_rate': 0.00010524590163934425, 'epoch': 8.1} + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0568, 'learning_rate': 0.00010475409836065573, 'epoch': 8.11} + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0609, 'learning_rate': 0.0001042622950819672, 'epoch': 8.12} +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0569, 'learning_rate': 0.00010377049180327867, 'epoch': 8.13} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0584, 'learning_rate': 0.00010327868852459015, 'epoch': 8.13} +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0536, 'learning_rate': 0.00010278688524590164, 'epoch': 8.14} + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|██████████████████████████████████████���██████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0542, 'learning_rate': 0.0001022950819672131, 'epoch': 8.15} + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0597, 'learning_rate': 0.00010180327868852458, 'epoch': 8.16} +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:36,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:36,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:36,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0608, 'learning_rate': 0.00010131147540983606, 'epoch': 8.17} +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:50:51,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 22:50:51,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:55,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:50:55,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|██████████████████████████████████████████████████████████████▏ | 908/1110 [5:49:15<1:11:40, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 82%|██████████████████████████████████████████████████████████████▏ | 908/1110 [5:49:15<1:11:40, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0479, 'learning_rate': 0.00010081967213114753, 'epoch': 8.18} +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:03,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:05,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:05,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:05,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:11,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:13,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:15,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:17,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:17,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:19,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:21,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:23,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:25,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:26,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:28,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:32,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:32,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:33,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:35,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:36,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:39,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:41,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:43,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:43,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:45,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:47,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:49,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:51,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:53,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:53,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:56,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:57,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:59,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:51:59,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:00,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:04,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:04,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:07,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:07,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:11,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:11,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:14,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:14,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:18,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:22,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:22,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:25,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:25,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:29,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:29,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:32,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:32,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:36,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:36,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:39,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:43,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:43,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:46,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:46,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:51,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1007, 'learning_rate': 9.737704918032786e-05, 'epoch': 8.24} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0901, 'learning_rate': 9.688524590163933e-05, 'epoch': 8.25} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0894, 'learning_rate': 9.639344262295081e-05, 'epoch': 8.26} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0852, 'learning_rate': 9.59016393442623e-05, 'epoch': 8.27} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0744, 'learning_rate': 9.540983606557375e-05, 'epoch': 8.28} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.067, 'learning_rate': 9.491803278688524e-05, 'epoch': 8.29} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0717, 'learning_rate': 9.442622950819672e-05, 'epoch': 8.3} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0604, 'learning_rate': 9.393442622950819e-05, 'epoch': 8.3} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0617, 'learning_rate': 9.344262295081966e-05, 'epoch': 8.31} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0527, 'learning_rate': 9.295081967213114e-05, 'epoch': 8.32} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.057, 'learning_rate': 9.245901639344261e-05, 'epoch': 8.33} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0499, 'learning_rate': 9.19672131147541e-05, 'epoch': 8.34} +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████���████████████████████████████████████▍ | 927/1110 [5:56:24<1:14:54, 24.56s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0546, 'learning_rate': 9.098360655737704e-05, 'epoch': 8.36} +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0565, 'learning_rate': 9.049180327868852e-05, 'epoch': 8.37} +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:08,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:08,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0507, 'learning_rate': 8.999999999999999e-05, 'epoch': 8.38} +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:35,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:35,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0496, 'learning_rate': 8.950819672131147e-05, 'epoch': 8.39} +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:49,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:49,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:49,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:55,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 22:59:55,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|███████████████████████████████████████████████████████████████▊ | 932/1110 [5:58:16<1:05:11, 21.98s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|███████████████████████████████████████████████████████████████▊ | 932/1110 [5:58:16<1:05:11, 21.98s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:01,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:01,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:06,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:06,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:10,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:12,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:12,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:16,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:16,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|███████████████████████████████████████████████████████████████▉ | 933/1110 [5:58:34<1:02:02, 21.03s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:20,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:20,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:24,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:26,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:26,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:30,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:32,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:00:32,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:36,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:36,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:38,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:42,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:44,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:46,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:47,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:51,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:51,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:52,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:54,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:57,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:00:58,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:01,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:02,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:02,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:05,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:06,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:08,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:11,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:11,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:13,145 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:15,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:17,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:19,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:19,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0566, 'learning_rate': 8.60655737704918e-05, 'epoch': 8.45} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:22,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:22,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:26,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:29,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:29,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:33,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:33,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:40,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:44,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:44,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:47,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:47,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1376, 'learning_rate': 8.557377049180327e-05, 'epoch': 8.46} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:51,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:51,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:54,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:54,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:01:58,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:05,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:05,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1047, 'learning_rate': 8.508196721311476e-05, 'epoch': 8.47} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0878, 'learning_rate': 8.459016393442622e-05, 'epoch': 8.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0905, 'learning_rate': 8.40983606557377e-05, 'epoch': 8.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0747, 'learning_rate': 8.360655737704916e-05, 'epoch': 8.49} + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|█████████████████████████████████���██████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0621, 'learning_rate': 8.311475409836065e-05, 'epoch': 8.5} + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.075, 'learning_rate': 8.262295081967212e-05, 'epoch': 8.51} + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0683, 'learning_rate': 8.21311475409836e-05, 'epoch': 8.52} + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0557, 'learning_rate': 8.163934426229507e-05, 'epoch': 8.53} + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0588, 'learning_rate': 8.114754098360656e-05, 'epoch': 8.54} + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0597, 'learning_rate': 8.065573770491802e-05, 'epoch': 8.55} + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0598, 'learning_rate': 8.01639344262295e-05, 'epoch': 8.56} + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0542, 'learning_rate': 7.967213114754097e-05, 'epoch': 8.57} + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.051, 'learning_rate': 7.918032786885245e-05, 'epoch': 8.57} +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0587, 'learning_rate': 7.868852459016393e-05, 'epoch': 8.58} +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0526, 'learning_rate': 7.81967213114754e-05, 'epoch': 8.59} +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0481, 'learning_rate': 7.770491803278687e-05, 'epoch': 8.6} +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████████▏ | 956/1110 [6:07:16<58:33, 22.82s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████████▏ | 956/1110 [6:07:16<58:33, 22.82s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0511, 'learning_rate': 7.721311475409836e-05, 'epoch': 8.61} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:12,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:12,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.038, 'learning_rate': 7.672131147540982e-05, 'epoch': 8.62} +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:24,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:24,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:28,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:30,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:30,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:34,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:09:37,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████████▎ | 958/1110 [6:07:55<53:13, 21.01s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████████▎ | 958/1110 [6:07:55<53:13, 21.01s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:41,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:43,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:45,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:45,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:45,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:51,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:53,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:55,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:57,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:57,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:09:59,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:00,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:02,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:04,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:06,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:08,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:11,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:11,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:13,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:14,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:17,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:19,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:21,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:23,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:23,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:25,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:27,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:29,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:31,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:33,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:33,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:35,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:37,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:39,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:39,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0697, 'learning_rate': 7.377049180327868e-05, 'epoch': 8.67} +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:43,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:43,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:46,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:46,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:50,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:50,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:53,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:57,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:10:57,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:01,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:01,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:04,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:08,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:08,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.1125, 'learning_rate': 7.327868852459015e-05, 'epoch': 8.68} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:11,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:11,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:15,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:15,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:18,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:22,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:22,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:25,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:25,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:31,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0963, 'learning_rate': 7.278688524590164e-05, 'epoch': 8.69} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0785, 'learning_rate': 7.229508196721311e-05, 'epoch': 8.7} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0818, 'learning_rate': 7.180327868852459e-05, 'epoch': 8.71} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0798, 'learning_rate': 7.131147540983606e-05, 'epoch': 8.72} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.064, 'learning_rate': 7.081967213114753e-05, 'epoch': 8.73} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0586, 'learning_rate': 7.032786885245901e-05, 'epoch': 8.74} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0594, 'learning_rate': 6.983606557377048e-05, 'epoch': 8.74} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0661, 'learning_rate': 6.934426229508197e-05, 'epoch': 8.75} +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0698, 'learning_rate': 6.885245901639344e-05, 'epoch': 8.76} + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0558, 'learning_rate': 6.836065573770492e-05, 'epoch': 8.77} + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0524, 'learning_rate': 6.786885245901639e-05, 'epoch': 8.78} + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0562, 'learning_rate': 6.737704918032786e-05, 'epoch': 8.79} + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████��� | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0507, 'learning_rate': 6.688524590163934e-05, 'epoch': 8.8} + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0477, 'learning_rate': 6.639344262295081e-05, 'epoch': 8.81} +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0525, 'learning_rate': 6.590163934426228e-05, 'epoch': 8.82} +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:39,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:39,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:56,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:56,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:17:56,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:14,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:14,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0446, 'learning_rate': 6.491803278688524e-05, 'epoch': 8.83} + 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:18:37,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████████ | 982/1110 [6:16:55<47:14, 22.14s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████████ | 982/1110 [6:16:55<47:14, 22.14s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0443, 'learning_rate': 6.442622950819672e-05, 'epoch': 8.84} + 88%|█████████████████████████████████████████████████████████████████████ | 982/1110 [6:16:55<47:14, 22.14s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:45,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:45,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:45,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:51,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:53,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:18:53,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:18:58,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:18:58,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0344, 'learning_rate': 6.393442622950819e-05, 'epoch': 8.85} +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:01,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:12,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:19:12,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:15,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:15,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▏ | 984/1110 [6:17:33<43:05, 20.52s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:19,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:21,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:23,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:25,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:27,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:30,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▏ | 985/1110 [6:17:48<39:06, 18.77s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▏ | 985/1110 [6:17:48<39:06, 18.77s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:34,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:37,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:40,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:41,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▎ | 986/1110 [6:18:00<34:31, 16.71s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▎ | 986/1110 [6:18:00<34:31, 16.71s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:45,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:48,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:50,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:52,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:52,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:54,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:56,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:19:58,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it] Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it] Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:05,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:08,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:08,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:12,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:12,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:15,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:19,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:19,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:22,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:22,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:26,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 989/1110 [6:18:45<34:28, 17.10s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 989/1110 [6:18:45<34:28, 17.10s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████████▍ | 989/1110 [6:18:45<34:28, 17.10s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:33,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:33,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:36,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:36,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:40,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:43,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:43,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:46,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:46,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0814, 'learning_rate': 6.0491803278688514e-05, 'epoch': 8.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0625, 'learning_rate': 5.9999999999999995e-05, 'epoch': 8.92} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0625, 'learning_rate': 5.950819672131147e-05, 'epoch': 8.93} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0568, 'learning_rate': 5.901639344262294e-05, 'epoch': 8.94} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0485, 'learning_rate': 5.8524590163934416e-05, 'epoch': 8.95} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0434, 'learning_rate': 5.8032786885245896e-05, 'epoch': 8.96} + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████���███████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0434, 'learning_rate': 5.754098360655737e-05, 'epoch': 8.97} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:46,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████████ | 997/1110 [6:22:04<43:13, 22.95s/it] Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████��██ | 997/1110 [6:22:04<43:13, 22.95s/it] Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:57,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:23:59,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:01,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:03,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████████▏ | 998/1110 [6:22:21<39:34, 21.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████████▏ | 998/1110 [6:22:21<39:34, 21.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:07,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:10,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:11,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:14,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████████▏ | 999/1110 [6:22:32<33:49, 18.28s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████████▏ | 999/1110 [6:22:32<33:49, 18.28s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:18,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:19,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:21,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:21,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:25,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:25,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:28,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:32,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:32,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:35,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:35,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:39,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/28/2022 23:30:15 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 0.35239124298095703, 'eval_wer': 0.10420468068226894, 'eval_runtime': 326.5742, 'eval_samples_per_second': 8.09, 'eval_steps_per_second': 0.508, 'epoch': 9.01} +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...