sanchit-gandhi HF staff commited on
Commit
91fd052
1 Parent(s): 53c3642

End of training

Browse files
all_results.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "epoch": 10.0,
3
- "eval_loss": 0.3577839434146881,
4
- "eval_runtime": 722.44,
5
  "eval_samples": 2642,
6
- "eval_samples_per_second": 3.657,
7
- "eval_steps_per_second": 0.458,
8
- "eval_wer": 0.09315747719159063,
9
- "train_loss": 1.290548439615828,
10
- "train_runtime": 51756.4072,
11
  "train_samples": 28538,
12
- "train_samples_per_second": 5.514,
13
  "train_steps_per_second": 0.043
14
  }
1
  {
2
  "epoch": 10.0,
3
+ "eval_loss": 4.286533355712891,
4
+ "eval_runtime": 769.3537,
5
  "eval_samples": 2642,
6
+ "eval_samples_per_second": 3.434,
7
+ "eval_steps_per_second": 0.43,
8
+ "eval_wer": 1.5370686235620785,
9
+ "train_loss": 4.09313003856505,
10
+ "train_runtime": 51677.5666,
11
  "train_samples": 28538,
12
+ "train_samples_per_second": 5.522,
13
  "train_steps_per_second": 0.043
14
  }
eval_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 10.0,
3
- "eval_loss": 0.3577839434146881,
4
- "eval_runtime": 722.44,
5
  "eval_samples": 2642,
6
- "eval_samples_per_second": 3.657,
7
- "eval_steps_per_second": 0.458,
8
- "eval_wer": 0.09315747719159063
9
  }
1
  {
2
  "epoch": 10.0,
3
+ "eval_loss": 4.286533355712891,
4
+ "eval_runtime": 769.3537,
5
  "eval_samples": 2642,
6
+ "eval_samples_per_second": 3.434,
7
+ "eval_steps_per_second": 0.43,
8
+ "eval_wer": 1.5370686235620785
9
  }
runs/Mar27_21-02-01_sanchit--v100/events.out.tfevents.1648467705.sanchit--v100.3678730.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04038ddfa8091403c9b14ae8d9465dd597bdee16e91185e9d119df1ed1424814
3
+ size 358
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 10.0,
3
- "train_loss": 1.290548439615828,
4
- "train_runtime": 51756.4072,
5
  "train_samples": 28538,
6
- "train_samples_per_second": 5.514,
7
  "train_steps_per_second": 0.043
8
  }
1
  {
2
  "epoch": 10.0,
3
+ "train_loss": 4.09313003856505,
4
+ "train_runtime": 51677.5666,
5
  "train_samples": 28538,
6
+ "train_samples_per_second": 5.522,
7
  "train_steps_per_second": 0.043
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
wandb/run-20220327_210229-2wif55w7/files/output.log CHANGED
@@ -27471,3 +27471,355 @@ Upload file wandb/run-20220327_210229-2wif55w7/logs/debug-internal.log: 100%|█
27471
  Upload file wandb/run-20220327_210229-2wif55w7/logs/debug-internal.log: 100%|██████| 10.0M/10.0M [02:23<00:00, 73.1kB/s]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27472
  03/28/2022 11:28:20 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn
27473
  [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27471
  Upload file wandb/run-20220327_210229-2wif55w7/logs/debug-internal.log: 100%|██████| 10.0M/10.0M [02:23<00:00, 73.1kB/s]g-point operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27472
  03/28/2022 11:28:20 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn
27473
  [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27474
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27475
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27476
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27477
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27478
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27479
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27480
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27481
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27482
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27483
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27484
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27485
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27486
+ [INFO|modelcard.py:460] 2022-03-28 11:28:22,581 >> Dropping the following result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27487
+ 03/28/2022 11:28:53 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn
27488
+ cd9d427..53c3642 main -> main
27489
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27490
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27491
+ ***** train metrics *****
27492
+ epoch = 10.0
27493
+ train_loss = 4.0931
27494
+ train_runtime = 14:21:17.56
27495
+ train_samples = 28538
27496
+ train_samples_per_second = 5.522
27497
+ train_steps_per_second = 0.043
27498
+ 03/28/2022 11:28:56 - INFO - __main__ - *** Evaluate ***
27499
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27500
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27501
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27502
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27503
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27504
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27505
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27506
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27507
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27508
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27509
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27510
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27511
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27512
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27513
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27514
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27515
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27516
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27517
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27518
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27519
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27520
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27521
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27522
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27523
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27524
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27525
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27526
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27527
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27528
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27529
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27530
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27531
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27532
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27533
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27534
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27535
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27536
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27537
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27538
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27539
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27540
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27541
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27542
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27543
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27544
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27545
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27546
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27547
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27548
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27549
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27550
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27551
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27552
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27553
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27554
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27555
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27556
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27557
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27558
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27559
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27560
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27561
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27562
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27563
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27564
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27565
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27566
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27567
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27568
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27569
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27570
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27571
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27572
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27573
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27574
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27575
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27576
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27577
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27578
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27579
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27580
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27581
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27582
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27583
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27584
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27585
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27586
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27587
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27588
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27589
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27590
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27591
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27592
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27593
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27594
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27595
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27596
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27597
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27598
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27599
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27600
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27601
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27602
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27603
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27604
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27605
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27606
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27607
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27608
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27609
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27610
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27611
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27612
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27613
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27614
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27615
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27616
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27617
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27618
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27619
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27620
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27621
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27622
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27623
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27624
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27625
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27626
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27627
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27628
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27629
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27630
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27631
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27632
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27633
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27634
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27635
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27636
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27637
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27638
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27639
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27640
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27641
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27642
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27643
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27644
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27645
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27646
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27647
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27648
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27649
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27650
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27651
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27652
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27653
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27654
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27655
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27656
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27657
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27658
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27659
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27660
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27661
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27662
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27663
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27664
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27665
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27666
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27667
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27668
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27669
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27670
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27671
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27672
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27673
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27674
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27675
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27676
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27677
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27678
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27679
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27680
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27681
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27682
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27683
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27684
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27685
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27686
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27687
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27688
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27689
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27690
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27691
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27692
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27693
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27694
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27695
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27696
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27697
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27698
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27699
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27700
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27701
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27702
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27703
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27704
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27705
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27706
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27707
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27708
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27709
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27710
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27711
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27712
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27713
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27714
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27715
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27716
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27717
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27718
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27719
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27720
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27721
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27722
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27723
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27724
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27725
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27726
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27727
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27728
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27729
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27730
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27731
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27732
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27733
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27734
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27735
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27736
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27737
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27738
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27739
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27740
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27741
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27742
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27743
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27744
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27745
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27746
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27747
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27748
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27749
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27750
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27751
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27752
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27753
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27754
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27755
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27756
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27757
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27758
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27759
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27760
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27761
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27762
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27763
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27764
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27765
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27766
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27767
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27768
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27769
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27770
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27771
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27772
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27773
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27774
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27775
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27776
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27777
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27778
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27779
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27780
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27781
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27782
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27783
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27784
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27785
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27786
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27787
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27788
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27789
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27790
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27791
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27792
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27793
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27794
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27795
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27796
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27797
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27798
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27799
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27800
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27801
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27802
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27803
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27804
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27805
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27806
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27807
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27808
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27809
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27810
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27811
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27812
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27813
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27814
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27815
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
27816
+ 03/28/2022 11:41:45 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow
27817
+ ***** eval metrics *****
27818
+ epoch = 10.0
27819
+ eval_loss = 4.2865
27820
+ eval_runtime = 0:12:49.35
27821
+ eval_samples = 2642
27822
+ eval_samples_per_second = 3.434
27823
+ eval_steps_per_second = 0.43
27824
+ eval_wer = 1.5371
27825
+ [INFO|trainer.py:2369] 2022-03-28 11:28:56,189 >> Batch size = 8llowing result as it does not have all the necessary fields:t operations will not be computed-28 11:13:17,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
wandb/run-20220327_210229-2wif55w7/files/wandb-summary.json CHANGED
The diff for this file is too large to render. See raw diff
wandb/run-20220327_210229-2wif55w7/logs/debug-internal.log CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4912da4038f46f172f0861384951ad0222149eeb32c34f25dc63b9722538a1e4
3
- size 10533869
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddba4cd665207dacf7b237e7f4996b0c3cc8f003d7b6a7448b8edbbdfd7ad057
3
+ size 10622676
wandb/run-20220327_210229-2wif55w7/run-2wif55w7.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bdb5f3e6744139ec5f6ecfd51fdbe2bf0c9580b1bc29fc28793fbfa2e86af628
3
- size 456420437
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bf45dfd9633a95ac1008847ad5ccda04c7ad13826143cbd3fc96d98bc6b736a
3
+ size 456540236