diff --git "a/wandb/run-20220328_170142-by95ehra/files/output.log" "b/wandb/run-20220328_170142-by95ehra/files/output.log" --- "a/wandb/run-20220328_170142-by95ehra/files/output.log" +++ "b/wandb/run-20220328_170142-by95ehra/files/output.log" @@ -12509,3 +12509,1325 @@ {'eval_loss': 0.35239124298095703, 'eval_wer': 0.10420468068226894, 'eval_runtime': 326.5742, 'eval_samples_per_second': 8.09, 'eval_steps_per_second': 0.508, 'epoch': 9.01} [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0658, 'learning_rate': 5.5081967213114745e-05, 'epoch': 9.02} + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0527, 'learning_rate': 5.4590163934426226e-05, 'epoch': 9.03} + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0639, 'learning_rate': 5.40983606557377e-05, 'epoch': 9.04} + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0622, 'learning_rate': 5.360655737704917e-05, 'epoch': 9.04} + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0566, 'learning_rate': 5.3114754098360647e-05, 'epoch': 9.05} + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0423, 'learning_rate': 5.262295081967213e-05, 'epoch': 9.06} + 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0496, 'learning_rate': 5.21311475409836e-05, 'epoch': 9.07} +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0438, 'learning_rate': 5.1639344262295074e-05, 'epoch': 9.08} +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.04, 'learning_rate': 5.114754098360655e-05, 'epoch': 9.09} +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0399, 'learning_rate': 5.065573770491803e-05, 'epoch': 9.1} +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0494, 'learning_rate': 5.01639344262295e-05, 'epoch': 9.11} +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0367, 'learning_rate': 4.9672131147540976e-05, 'epoch': 9.12} + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0387, 'learning_rate': 4.918032786885245e-05, 'epoch': 9.13} + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0372, 'learning_rate': 4.868852459016393e-05, 'epoch': 9.13} +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0334, 'learning_rate': 4.8196721311475404e-05, 'epoch': 9.14} +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0286, 'learning_rate': 4.770491803278688e-05, 'epoch': 9.15} +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0352, 'learning_rate': 4.721311475409836e-05, 'epoch': 9.16} +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:30,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:30,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:30,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0299, 'learning_rate': 4.672131147540983e-05, 'epoch': 9.17} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:39:36,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:39:36,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:40,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:40,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:39:44,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:39:44,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:48,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:48,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:39:55,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:39:55,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:39:58,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:00,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:03,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:09,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:11,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:13,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:40:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:20,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:22,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:22,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:24,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:25,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:29,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:30,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:32,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:34,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:34,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:36,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:39,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:41,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:42,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:44,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:44,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:46,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:49,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:51,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:51,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:52,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:52,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:56,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:40:56,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:00,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:03,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:03,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:07,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:07,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:10,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:10,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:14,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:17,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:17,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:21,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:21,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0687, 'learning_rate': 4.327868852459016e-05, 'epoch': 9.23} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:25,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:28,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:28,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:32,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:32,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:35,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:35,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:39,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.071, 'learning_rate': 4.2786885245901634e-05, 'epoch': 9.24} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0587, 'learning_rate': 4.229508196721311e-05, 'epoch': 9.25} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0558, 'learning_rate': 4.180327868852458e-05, 'epoch': 9.26} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0559, 'learning_rate': 4.131147540983606e-05, 'epoch': 9.27} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0525, 'learning_rate': 4.0819672131147536e-05, 'epoch': 9.28} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0428, 'learning_rate': 4.032786885245901e-05, 'epoch': 9.29} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.05, 'learning_rate': 3.983606557377048e-05, 'epoch': 9.3} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.043, 'learning_rate': 3.9344262295081964e-05, 'epoch': 9.3} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0455, 'learning_rate': 3.885245901639344e-05, 'epoch': 9.31} +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0381, 'learning_rate': 3.836065573770491e-05, 'epoch': 9.32} + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0315, 'learning_rate': 3.786885245901639e-05, 'epoch': 9.33} + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|█████████████████████��█████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0366, 'learning_rate': 3.7377049180327865e-05, 'epoch': 9.34} +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0286, 'learning_rate': 3.688524590163934e-05, 'epoch': 9.35} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0401, 'learning_rate': 3.639344262295082e-05, 'epoch': 9.36} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0402, 'learning_rate': 3.590163934426229e-05, 'epoch': 9.37} +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:04,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:04,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0349, 'learning_rate': 3.540983606557377e-05, 'epoch': 9.38} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:32,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:32,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:32,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:51,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:51,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:51,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0276, 'learning_rate': 3.442622950819672e-05, 'epoch': 9.39} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:57,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:57,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:48:57,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:03,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:03,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:49:07,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:49:07,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:11,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1044/1110 [6:47:29<23:19, 21.21s/it] Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▍ | 1044/1110 [6:47:29<23:19, 21.21s/it] Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:49:15,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:49:15,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:19,257 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:21,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:23,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:25,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:27,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:29,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:29,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:31,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:33,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:35,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:37,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:37,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:41,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:43,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:45,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:45,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▌ | 1046/1110 [6:48:02<20:04, 18.82s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:50,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:51,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:53,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:55,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:57,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:49:57,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:00,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:01,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:03,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:05,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▋ | 1048/1110 [6:48:24<15:01, 14.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 94%|████████████████████████████████████████████████████████████████████████▋ | 1048/1110 [6:48:24<15:01, 14.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:09,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:12,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:14,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:14,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 1049/1110 [6:48:31<12:30, 12.30s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 1049/1110 [6:48:31<12:30, 12.30s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:19,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:19,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:23,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:26,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:26,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:30,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:30,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:34,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:34,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:37,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:41,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:41,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:41,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 1050/1110 [6:49:00<17:18, 17.31s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 95%|████████████████████████████████████████████████████████████████████████▊ | 1050/1110 [6:49:00<17:18, 17.31s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:48,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:48,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:55,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:55,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:58,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:50:58,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:02,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0621, 'learning_rate': 3.049180327868852e-05, 'epoch': 9.47} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.054, 'learning_rate': 2.9999999999999997e-05, 'epoch': 9.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0524, 'learning_rate': 2.950819672131147e-05, 'epoch': 9.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0471, 'learning_rate': 2.9016393442622948e-05, 'epoch': 9.49} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.048, 'learning_rate': 2.8524590163934422e-05, 'epoch': 9.5} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0498, 'learning_rate': 2.80327868852459e-05, 'epoch': 9.51} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0381, 'learning_rate': 2.7540983606557373e-05, 'epoch': 9.52} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0503, 'learning_rate': 2.704918032786885e-05, 'epoch': 9.53} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0317, 'learning_rate': 2.6557377049180323e-05, 'epoch': 9.54} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0365, 'learning_rate': 2.60655737704918e-05, 'epoch': 9.55} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0385, 'learning_rate': 2.5573770491803274e-05, 'epoch': 9.56} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0298, 'learning_rate': 2.508196721311475e-05, 'epoch': 9.57} + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0314, 'learning_rate': 2.4098360655737702e-05, 'epoch': 9.58} + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0316, 'learning_rate': 2.360655737704918e-05, 'epoch': 9.59} +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:57:37,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:57:37,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:57:37,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0386, 'learning_rate': 2.262295081967213e-05, 'epoch': 9.61} +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0289, 'learning_rate': 2.2131147540983603e-05, 'epoch': 9.62} +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:21,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:23,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:23,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:23,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:29,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:31,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:31,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:31,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████▏ | 1069/1110 [6:56:51<14:30, 21.23s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████▏ | 1069/1110 [6:56:51<14:30, 21.23s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:39,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:42,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:44,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 23:58:44,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:48,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:50,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:52,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:52,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:54,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:56,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:58,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:59,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:58:59,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:04,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:07,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:07,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 96%|██████████████████████████████████████████████████████████████████████████▎ | 1071/1110 [6:57:25<12:18, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:12,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:14,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:15,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:18,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:20,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:20,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:22,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:24,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:26,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:28,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 1073/1110 [6:57:47<09:04, 14.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▍ | 1073/1110 [6:57:47<09:04, 14.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:33,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:34,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:36,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:36,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1074/1110 [6:57:54<07:28, 12.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1074/1110 [6:57:54<07:28, 12.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:43,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:43,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:46,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:46,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:50,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:50,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:53,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:57,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 23:59:57,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:00,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:00,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:04,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1075/1110 [6:58:23<10:09, 17.40s/it] Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1075/1110 [6:58:23<10:09, 17.40s/it] Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 97%|██████████████████████████████████████████████████████████████████████████▌ | 1075/1110 [6:58:23<10:09, 17.40s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:11,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:11,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:15,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:15,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:18,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:21,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:21,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0574, 'learning_rate': 1.819672131147541e-05, 'epoch': 9.69} +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0483, 'learning_rate': 1.7704918032786883e-05, 'epoch': 9.7} +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0521, 'learning_rate': 1.6721311475409834e-05, 'epoch': 9.72} + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0498, 'learning_rate': 1.622950819672131e-05, 'epoch': 9.73} + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0438, 'learning_rate': 1.5737704918032785e-05, 'epoch': 9.74} + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0459, 'learning_rate': 1.524590163934426e-05, 'epoch': 9.74} + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0403, 'learning_rate': 1.4754098360655736e-05, 'epoch': 9.75} + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0334, 'learning_rate': 1.4262295081967211e-05, 'epoch': 9.76} + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0457, 'learning_rate': 1.3770491803278686e-05, 'epoch': 9.77} + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0393, 'learning_rate': 1.3278688524590162e-05, 'epoch': 9.78} + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0364, 'learning_rate': 1.2786885245901637e-05, 'epoch': 9.79} + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0329, 'learning_rate': 1.2295081967213112e-05, 'epoch': 9.8} + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.042, 'learning_rate': 1.180327868852459e-05, 'epoch': 9.81} + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|█████████████████████████████████████████████████████████████████████���█████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0375, 'learning_rate': 1.1311475409836065e-05, 'epoch': 9.82} + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:46,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:46,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0371, 'learning_rate': 1.081967213114754e-05, 'epoch': 9.83} +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:00,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:00,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0285, 'learning_rate': 1.0327868852459016e-05, 'epoch': 9.83} +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:28,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:28,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:28,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:34,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:34,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:34,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0278, 'learning_rate': 9.836065573770491e-06, 'epoch': 9.84} +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:40,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:40,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:45,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:45,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:49,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:49,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:53,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:07:53,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:57,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:57,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:07:59,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:01,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:03,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:05,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:07,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:09,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:11,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:13,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:13,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:15,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:17,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:19,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:08:19,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:23,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:25,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:27,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 1096/1110 [7:06:44<04:19, 18.50s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████ | 1096/1110 [7:06:44<04:19, 18.50s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:30,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:33,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:34,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:37,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:39,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:39,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:41,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:43,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:46,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:48,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:50,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:50,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:52,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:08:53,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it] Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it] Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:00,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:04,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:04,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:07,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:07,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:11,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:11,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:15,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:18,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:18,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:21,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:21,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 1100/1110 [7:07:40<02:49, 16.94s/it] Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▎| 1100/1110 [7:07:40<02:49, 16.94s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:28,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:28,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:32,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:32,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:35,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:39,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:39,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:42,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:42,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:45,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0564, 'learning_rate': 5.901639344262295e-06, 'epoch': 9.91} +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0421, 'learning_rate': 5.40983606557377e-06, 'epoch': 9.92} +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0421, 'learning_rate': 4.9180327868852455e-06, 'epoch': 9.93} +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0357, 'learning_rate': 4.426229508196721e-06, 'epoch': 9.94} + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 99%|████████████████████████████████████████████��███████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0404, 'learning_rate': 3.934426229508196e-06, 'epoch': 9.95} +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.031, 'learning_rate': 3.4426229508196716e-06, 'epoch': 9.96} +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|█████████���██████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0362, 'learning_rate': 2.9508196721311474e-06, 'epoch': 9.97} +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:34,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:34,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:34,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0295, 'learning_rate': 2.4590163934426227e-06, 'epoch': 9.98} +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:49,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:51,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:53,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-29 00:12:53,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:12:57,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:12:59,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:01,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:01,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:03,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:04,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:07,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:08,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:11,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-29 00:13:11,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 0.0308, 'learning_rate': 1.4754098360655737e-06, 'epoch': 10.0} +[INFO|trainer.py:2114] 2022-03-29 00:13:12,718 >> Saving model checkpoint to ./=)compatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2114] 2022-03-29 00:13:24,702 >> Saving model checkpoint to ./=)compatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2114] 2022-03-29 00:13:24,702 >> Saving model checkpoint to ./=)compatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...